Formula Tokenizer

Posted on December 5, 2009 by Rob van Gelder

A few years ago, I wrote a Formula Formatter add-in to present a long formula for easier reading. It does it through a process called Tokenizing, which is basically a process of putting the components of a formula into labelled boxes.
That add-in was compatible with Excel 2003 formulas, but Excel 2007 brought us extra formulas which meant my add-in was incomplete.

It was the first time I’d ever developed a tokenizer, so after a while I thought of better ways of doing it.
From my Oracle days, I recalled those diagrams in SQL manuals. Kind of like bubbles with lines leading in and leading out detailing the syntax of a statement. At the time I figured I could reuse this technique to document Excel Formulas. Of course, now that I’ve done a bit of reading, I’ve learned these are called Syntax Diagrams, and that’s exactly what they were intended for… visualising the syntax of formulas, or really, visualising BNF. BNF itself is a way of documenting a programming language’s grammar.

After some weeks of ripping apart formulas, I completed a BNF document describing Excel Formulas.
Probably the very next day, I stumbled across this document on Microsoft’s website: Excel Binary File Format (.xls) Structure Specification. It’s got BNF galore.
I felt like the guy who spent a weekend learning to program his VCR without a manual, only for the manual to turn up the next day.

So, with syntax in hand, I developed a tokenizer, which was quite a lot of fun.
It does Excel 2007 Tables, which was really my goal from the start. I also tidied up the bits around external named references.
It should work with International versions, but I use English only.
Ron de Bruin gave me some advice in this area (thanks Ron!), but still feel there might be problems with errors such as #VALUE!. You’ll find them defined once at the top of the module if you want to play.

Now that I’m finished, I’m sharing… You can download the code here (a zip file at 75 KB)
I’m not really interested in developing a user interface for formula formatter this time around. I’m happy to leave that to anyone else.

I wrote a proof of concept userform. In the image below, I’ve doubleclicked the IF function, and the whole IF, including it’s content, is highlighted.
tokenizer

31 thoughts on “Formula Tokenizer”

Conrad Brits says:

December 8, 2009 at 3:01 pm

Rob, I tried to down load the 2005 AudXL app without success! Is it still available?
Rob van Gelder says:

December 8, 2009 at 4:37 pm

You can find it on my website:
http://vangelder.orcon.net.nz/excel/
Kanwaljit says:

December 9, 2009 at 12:46 am

Rob,
But I found it at the following link. Is there any update after that ?
http://vangelder.orconhosting.net.nz/excel/audxl.html
Conrad Brits says:

December 9, 2009 at 7:58 am

Rob, that is the link that I used but, when clicking through, it changed to http://vangelder.orconhosting.net.nz/excel/ with a Google Trivia link and the rest of the page being blank. Something I’m doing wrong here or a setting in IE on my PC?
The link Kanwaljit has did work though.
Regards.
Rob van Gelder says:

December 9, 2009 at 12:11 pm

Sorry, I’m better at Excel than at making webpages!
Kanwaljit’s link is what you want. That was the last version I made.
Ben Norris says:

April 23, 2010 at 9:17 am

Hi Rob,

I’ve recently been playing around with your original tokeniser (from AudXL) and have found it to be excellently written, fast and very useful. I’ve only just found this article where you introduce the new version. To be honest, I’ve not really used the original code in XL2007 much so hadn’t noticed any issues with it in that respect, but am keen to use the new code as I’ll doubtless be using XL2007/2010 more going forward.

Two quick points:

1. In the original version, the parser incorrectly recognised a cell with the text “Unit(s)” in it (note: not entered as a formula, just as a text label) as a function called Unit. This was pretty easy to fix but I thought you might like to be aware of it. I don’t think the new version has the same problem. It happens with any text with an open bracket after it.

2. In the old version you had a specific token type for ‘Operand Reference Named Range’ which seems to have gone from the new version – both cell references and named ranges are returned as token type ‘Reference’. I quite liked having the ability to distinguish between a reference to a cell and a reference to a named range so I was thinking of adding some code to do this. In your original tokeniser you wrote a function called ‘IsReferenceA1’ to check if a reference was to a cell or to a named range and I was thinking of re-writing this for XL2007 (to take account of the Big Grid). Before I do that I was wondering if you’d already tried that and abandoned it as not being possible, or if there was some other reason you left it out.

Many thanks for offering this great code to the community – it’s been hugely inspiring, educational and useful!

Rgds,

Ben
Chris Greaves says:

May 4, 2010 at 4:00 am

“Now that I’m finished, I’m sharing…”
Thanks Rob.
I started using BN back in the early seventies, but have been away for a while.
I felt a need to write a formula parser, and you popped up in a search.
Your comment above reminds me of the spirit back in the days of 80-column punched cards ….

Here you are: Enjoy a free copy of the indexer!
http://www.Indxr.ca
Rob van Gelder says:

July 4, 2010 at 3:33 am

A bit late in replying, sorry.

Ben: I decided not to go as far as distinguishing between cell references and named references. I don’t think it would be too hard to work out, but there’s the added problem of 2007 versus 2003 names.
In 2003 ABC123 is a name, but in 2007 it’s a cell.

Chris: pleased it’s of use to someone. cheers
Ben Norris says:

February 15, 2011 at 10:42 am

Hi Rob,

I’ve started using the tokeniser code in anger recently and noticed an ‘interesting’ memory issue that I can’t seem to resolve…

If you add a sheet to the Parser.xls file with about 40,000 formulae on it (just to give you something to work with) and run the tokeniser on each cell in turn then the memory used by Excel shoots up dramatically (an increase of over 120MB once all cells are processed). FYI I’m running Excel 2010 and the test code I ran looks like this:

Sub test()

Dim rngT As Range
Dim tknT As Tokens

For Each rngT In Sheet23.UsedRange
If Left(rngT.Formula, 1) = “=” Then
Set tknT = ParseFormula(rngT.Formula)
End If
Set tknT = Nothing
DoEvents
Next rngT

End Sub

As you can see I’m killing the tknT variable after each call of ParseFormula and as such wouldn’t expect the memory to rise as much as it is. The other odd thing is that once the code has completed running, if you click on the ‘Reset’ button in the VBE (looks like a Stop button next to the Play and Pause buttons) Excel seems to release all the memory that it has stored up (it takes a while for it to do this though and Excel hangs whilst doing it).

I can’t seem to see any global variables that are used in the Tokeniser so am really at a loss to understand what’s going on. Any ideas?!

Many thanks,

Ben.
Ben Norris says:

February 16, 2011 at 5:18 am

Just found the following and wonder if it might have something to do with it:

http://www.vbi.org/Items/article.asp?id=106

Will have a look at the tokeniser code and see if it falls into that trap. My gut feel is that it does…

Rgds,

Ben
Ben Norris says:

February 16, 2011 at 6:32 am

Ok, so I’ve found the issue and it is pretty much as described on the link above. The Class ‘Token’ contains a Public member called ‘Parent’ that is of type Class Tokens, thus creating the circularity described in the above article.

If you remove this member (and all references to it elsewhere in the code) then the memory issue is resolved. Note that in my application I have removed all of the code relating to the example form Rob included so the ‘Parent’ member is actually redundant anyway.

I’ve also added code at the end of the main ParseFormula procedure to set all local objects to Nothing, as well as code to do the same upon class_terminate for each of the classes.

Hope someone finds the above useful!

Ben
Rob van Gelder says:

February 16, 2011 at 12:36 pm

Hi Ben.

Thanks for identifying the bug!
That wasn’t very clever of me – naturally it would memory leak.

There is a way around it, as commented by Rob Bruce here.

Cheers
Julius says:

April 16, 2012 at 4:42 pm

Hi Rob, I just started to use AudXL, at the begining I used with Excel 2007 and it works fine, now I am using Excel 2010 and it does not work anymore, some kind of library reference issue; I am not a developer so for me its really hard to change the vba code in order to make things works; so, could you give some tip what to do, o maybe what part I have to change in the add in. I really appreciate.

Sorry for my english
Frank says:

November 6, 2012 at 2:09 am

Hi Rob,

quite good work you put up here.

I have tried to find the BNF for the excel formulas in the Excel Binary File Format (.xls) Structure Specification however i failed to find it :(

Can you maybe give me some advice where to look ?
Sumeet says:

January 10, 2014 at 2:10 am

Hi Rob/Ben,

I have this tokenizer to a good extent to make a step by step formula evaluator for complicated formulas but I am facing a lot of issues
with memeory leak the current level of physical memory is 300,000 k. I have followed what Ben had highlighted in older post on setting local obj to nothing and removed the Public member ‘Parent’.

Any expert suggestions?
Max says:

March 31, 2014 at 3:43 am

Hi Rob,

The link you posted is broken. Could you please repost it? Thanks :)

Max
Dick Kusleika says:

March 31, 2014 at 11:16 am

Max: This one?

http://vangelder.orconhosting.net.nz/files/ExcelFormulas.zip
Tony says:

November 26, 2014 at 9:59 am

Many thanks for this code, I have found it extremely useful for a project I’m working on to adjust function calls between two add-ins that do similar stuff but differently.

I have converted it to C# for my purposes and below is a copy of the C# class I’ve derived from your VBA code.

class XLTokenizer { private enum ParsingState { Expression1, Expression2, LeadingName1, LeadingName2, LeadingName3, LeadingNameE, WhiteSpace, Text1, Text2, Number1, Number2, Number3, Number4, NumberE, Bool, ErrorX, MinusSign, // for ambiguity between unary minus and sign PrefixOperator, ArithmeticOperator, ComparisonOperator1, ComparisonOperator2, ComparisonOperatorE, TextOperator, PostfixOperator, RangeOperator, ReferenceQualifier, ListSeparator, // for ambiguity between function parameter separator and reference union operator ArrayRowSeparator, ArrayColumnSeparator, FunctionX, ParameterSeparator, SubExpression, BracketClose, ArrayOpen, ArrayConstant1, ArrayConstant2, ArrayClose, SquareBracketOpen, Table1, Table2, Table3, Table4, Table5, Table6, TableQ, TableE, R1C1Reference // todo: implement R1C1Reference }
public enum TokenType { Text, Number, Bool, ErrorText, Reference, WhiteSpace, UnaryOperator, ArithmeticOperator, ComparisonOperator, TextOperator, RangeOperator, ReferenceQualifier, ExternalReferenceOperator, PostfixOperator, FunctionOpen, AtFunctionOpen, ParameterSeparator, FunctionClose, SubExpressionOpen, SubExpressionClose, ArrayOpen, ArrayRowSeparator, ArrayColumnSeparator, ArrayClose, TableOpen, TableSection, TableColumn, TableItemSeparator, TableColumnSeparator, TableClose } public class Token { public String TokenText { get; set; } public TokenType TokenType { get; set; } public bool Quoted { get; set; } public int FormulaIndex { get; set; } public Token(String tokenText, TokenType tokenType, bool quoted, int formulaIndex) { TokenText = tokenText; TokenType = tokenType; Quoted = quoted; FormulaIndex = formulaIndex; } } [Serializable()] public class ParseException : System.Exception { public ParseException() : base() { } public ParseException(string message) : base(message) { } public ParseException(string message, System.Exception inner) : base(message, inner) { } protected ParseException(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) { } } private static char cDecimalSeparator = '.'; private static char cListSeparator = ','; private static char cArrayRowSeparator = ';'; private static char cArrayColumnSeparator = ','; private static char cLeftBrace = '{'; private static char cRightBrace = '}'; private static char cLeftBracket = '['; private static char cRightBracket = ']'; private static char cLeftRoundBracket = '('; private static char cRightRoundBracket = ')'; private static char cDoubleQuote = '\"'; private static String strBooleanTrue = "TRUE"; private static String strBooleanFalse = "FALSE"; private static String strErrorRef = "#REF!"; private static String strErrorDiv0 = "#DIV/0!"; private static String strErrorNA = "#N/A"; private static String strErrorName = "#NAME?"; private static String strErrorNull = "#NULL!"; private static String strErrorNum = "#NUM!"; private static String strErrorValue = "#VALUE!"; private static String strErrorGettingData = "#GETTING_DATA"; private static String strLNSeparators = "+-*/^%=<>&:" + cRightRoundBracket + cRightBrace + cListSeparator; private static String strArithmeticOps = "+-*/^"; private static String strComparisonOps = "=<>"; private static String strsBoolean = "\t" + strBooleanTrue + "\t" + strBooleanFalse + "\t"; private static String strsError = "\t" + strErrorRef + "\t" + strErrorDiv0 + "\t" + strErrorNA + "\t" + strErrorName + "\t" + strErrorNull + "\t" + strErrorNum + "\t" + strErrorValue + "\t" + strErrorGettingData + "\t"; public static List testStrs = new List() { "=1\n\n \n 5", "=(A1 ,A5)", "=\"123\"\"456\"\"789\"", "=3.1E-24-2.1E-24", "=[MyBook.xlsx]Sheet1!$A$1", "=Func(,\"abc\",,)", "=IF(P5=1.0,\"NA\",IF(P5=2.0,\"A\",IF(P5=3.0,\"B\",IF(P5=4.0,\"C\",IF(P5=5.0,\"D\",IF(P5=6.0,\"E\",IF(P5=7.0,\"F\",IF(P5=8.0,\"G\"))))))))", "=SUM((A:A A1:B1))", "='MyCOM.AddIn.Function_Name'(,\"abc\",,)", "=IF(\"a\"={\"a\",\"b\";\"c\",#N/A;-1,TRUE}, \"yes\", \"no\") & \" more \"\"test\"\" text\"", "=+ AName- (-+-+-2^6) = {\"A\",\"B\"} + @SUM(R1C1) + (@ERROR.TYPE(#VALUE!) = 2)", "=SUM('Sheet-1'!A1:'Sheet-1'!A10)", "=Table1[#Headers]", "=Table1[[a]:[b]]", "=Table1[ [#Headers], [#Data], [a]:[b] ]", "=IF(((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24<=40,((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24,40)", "=Table1[[#All],[b'[]]", "=IF(L40>65,AQ26,ROUND((L12*IF(OR(AND(L40>=55, L41>=20),AND(L41>=20,$C$11=\"YES\")),L45,L44))+(L15 *IF(OR(AND(L40>=55,L41>=20),AND(L41>=20,$C$11=\"YES\")), L46,L44)),0))" }; public static void RunTests() { foreach (String formulaStr in testStrs) { List tokens = ParseFormula(formulaStr); if (tokens != null) { String reformedStr = ReformFormula(tokens); if (!reformedStr.Equals(formulaStr)) { throw new ParseException("Failed to parse and reform \"" + formulaStr + "\""); } } } } public static String ReformFormula(List tokens) { String formulaStr = "="; foreach (Token token in tokens) { if (token.TokenType == TokenType.AtFunctionOpen) formulaStr += "@"; if (token.TokenType == TokenType.Text) { formulaStr += cDoubleQuote; foreach (Char c in token.TokenText) { if (c == cDoubleQuote) formulaStr += cDoubleQuote; formulaStr += c; } formulaStr += cDoubleQuote; } else { if (token.Quoted) { if (token.TokenType == TokenType.TableColumn || token.TokenType == TokenType.TableSection) { formulaStr += "[" + token.TokenText + "]"; } else { formulaStr += '''; foreach (Char c in token.TokenText) { if (c == ''') formulaStr += '''; formulaStr += c; } formulaStr += '''; } } else formulaStr += token.TokenText; if (token.TokenType == TokenType.FunctionOpen || token.TokenType == TokenType.AtFunctionOpen) formulaStr += cLeftRoundBracket; else if (token.TokenType == TokenType.TableOpen) formulaStr += cLeftBracket; } } return formulaStr; } public static List ParseFormula(String strFormula) { int lngFormulaLen = strFormula.Length; if (lngFormulaLen <= 1) { throw new ParseException("Formula too short"); } if (!strFormula.StartsWith("=")) { throw new ParseException("Formula doesn't start with '='"); } List tokens = new List(); Stack tokenStack = new Stack(); ParsingState parseState = ParsingState.Expression1; ParsingState parseReturnState = parseState; String currentTokenStr = ""; int tokenIndex = 0; bool continueParse = true; int iChar = 1; bool inQuote = false; while (continueParse) { char c = (iChar < lngFormulaLen) ? strFormula[iChar] : Char.MinValue; switch (parseState) { case ParsingState.Expression1: { if (Char.IsWhiteSpace(c)) { parseReturnState = parseState; parseState = ParsingState.WhiteSpace; tokenIndex = iChar; } else if (c == cDoubleQuote) { parseReturnState = ParsingState.Expression2; parseState = ParsingState.Text1; tokenIndex = iChar; iChar++; } else if (Char.IsDigit(c)) { parseReturnState = ParsingState.Expression2; parseState = ParsingState.Number1; tokenIndex = iChar; } else if (c == '-') { currentTokenStr = Char.ToString(c); parseReturnState = ParsingState.Expression2; parseState = ParsingState.MinusSign; tokenIndex = iChar; iChar++; } else if (c == '+') { parseState = ParsingState.PrefixOperator; tokenIndex = iChar; } else if (c == cLeftRoundBracket) { parseState = ParsingState.SubExpression; } else if (c == cLeftBrace) { parseState = ParsingState.ArrayOpen; } else if (c == ''') { parseState = ParsingState.LeadingName2; inQuote = true; tokenIndex = iChar; iChar++; } else if (c == '#') { parseReturnState = ParsingState.Expression2; parseState = ParsingState.ErrorX; tokenIndex = iChar; } else { parseState = ParsingState.LeadingName1; tokenIndex = iChar; } } break; // -------------------- -------------------- -------------------- case ParsingState.Expression2: { if (c == Char.MinValue) { continueParse = false; } else if (Char.IsWhiteSpace(c)) { parseReturnState = parseState; parseState = ParsingState.WhiteSpace; tokenIndex = iChar; } else if (c == cRightRoundBracket) { parseState = ParsingState.BracketClose; } else if (strArithmeticOps.Contains(c)) { parseState = ParsingState.ArithmeticOperator; tokenIndex = iChar; } else if (strComparisonOps.Contains(c)) { parseState = ParsingState.ComparisonOperator1; tokenIndex = iChar; } else if (c == '&') { parseState = ParsingState.TextOperator; tokenIndex = iChar; } else if (c == '%') { parseState = ParsingState.PostfixOperator; tokenIndex = iChar; } else if (c == ':') { parseState = ParsingState.RangeOperator; tokenIndex = iChar; } else if (c == cListSeparator) { parseState = ParsingState.ListSeparator; tokenIndex = iChar; } else { // Check if whitespace is actually union operator Token objToken = tokens[tokens.Count - 1]; bool isError = true; if (objToken.TokenType == TokenType.WhiteSpace) { int offset = objToken.TokenText.IndexOf(' '); if (offset >= 0) { currentTokenStr = objToken.TokenText.Substring(offset + 1); if (offset == 0) { objToken.TokenType = TokenType.RangeOperator; objToken.TokenText = " "; } else { objToken.TokenText = objToken.TokenText.Substring(0, offset); tokens.Add(new Token(" ", TokenType.RangeOperator, false, objToken.FormulaIndex + offset - 1)); } if (currentTokenStr.Length > 0) { tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, objToken.FormulaIndex + offset)); currentTokenStr = ""; inQuote = false; } parseState = ParsingState.Expression1; isError = false; } } else if (objToken.TokenType == TokenType.ErrorText && objToken.TokenText.Equals(strErrorRef)) { parseState = ParsingState.Expression1; isError = false; } if (isError) { throw new ParseException("Expected Operator but got " + c + " at position " + iChar); } } } break; // -------------------- -------------------- -------------------- case ParsingState.ArrayConstant1: { if (c == cDoubleQuote) { parseReturnState = ParsingState.ArrayConstant2; parseState = ParsingState.Text1; tokenIndex = iChar; iChar++; } else if (Char.IsDigit(c)) { parseReturnState = ParsingState.ArrayConstant2; parseState = ParsingState.Number1; tokenIndex = iChar; } else if (c == '-') { currentTokenStr = Char.ToString(c); parseReturnState = ParsingState.ArrayConstant2; parseState = ParsingState.Number1; tokenIndex = iChar; iChar++; } else if (c == '#') { parseReturnState = ParsingState.ArrayConstant2; parseState = ParsingState.ErrorX; tokenIndex = iChar; } else { parseReturnState = ParsingState.ArrayConstant2; parseState = ParsingState.Bool; tokenIndex = iChar; } } break; // -------------------- -------------------- -------------------- case ParsingState.ArrayConstant2: { if (c == cRightBrace) { parseState = ParsingState.ArrayClose; } else if (c == cArrayRowSeparator) { parseState = ParsingState.ArrayRowSeparator; tokenIndex = iChar; } else if (c == cArrayColumnSeparator) { parseState = ParsingState.ArrayColumnSeparator; tokenIndex = iChar; } else { throw new ParseException("Expected " + cRightBrace + " " + cArrayRowSeparator + " " + ParsingState.ArrayColumnSeparator + " but got " + c + " at position " + iChar); } } break; // -------------------- -------------------- -------------------- case ParsingState.ArrayOpen: { tokenStack.Push(new Token(Char.ToString(c), TokenType.ArrayOpen, false, iChar)); tokens.Add(tokenStack.Peek()); parseState = ParsingState.ArrayConstant1; iChar++; } break; case ParsingState.ArrayClose: { Token prevToken = tokenStack.Pop(); if (prevToken.TokenType == TokenType.ArrayOpen) { tokens.Add(new Token(Char.ToString(c), TokenType.ArrayClose, false, iChar)); parseState = ParsingState.Expression2; iChar++; } else { throw new ParseException("Encountered " + cRightBrace + " without matching " + cLeftBrace + " at position " + iChar); } } break; case ParsingState.ArrayColumnSeparator: { iChar++; tokens.Add(new Token(Char.ToString(c), TokenType.ArrayColumnSeparator, false, tokenIndex)); parseState = ParsingState.ArrayConstant1; } break; case ParsingState.ArrayRowSeparator: { iChar++; tokens.Add(new Token(Char.ToString(c), TokenType.ArrayRowSeparator, false, tokenIndex)); parseState = ParsingState.ArrayConstant1; } break; // -------------------- -------------------- -------------------- case ParsingState.Bool: { currentTokenStr += c; iChar++; if (strsBoolean.IndexOf("\t" + currentTokenStr) >= 0) { if (strsBoolean.IndexOf("\t" + currentTokenStr + "\t") >= 0) { tokens.Add(new Token(currentTokenStr, TokenType.Bool, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = parseReturnState; } } else { throw new ParseException("Expected Boolean Constant at position " + tokenIndex); } } break; // -------------------- -------------------- -------------------- case ParsingState.WhiteSpace: { if (Char.IsWhiteSpace(c)) { currentTokenStr += c; iChar++; } else { tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = parseReturnState; } } break; // -------------------- -------------------- -------------------- case ParsingState.SubExpression: { tokenStack.Push(new Token(Char.ToString(c), TokenType.SubExpressionOpen, inQuote, iChar)); tokens.Add(tokenStack.Peek()); parseState = ParsingState.Expression1; iChar++; } break; case ParsingState.BracketClose: { Token prevToken = tokenStack.Pop(); if (prevToken.TokenType == TokenType.FunctionOpen || prevToken.TokenType == TokenType.AtFunctionOpen) { tokens.Add(new Token(Char.ToString(c), TokenType.FunctionClose, false, iChar)); parseState = ParsingState.Expression2; } else if (prevToken.TokenType == TokenType.SubExpressionOpen) { tokens.Add(new Token(Char.ToString(c), TokenType.SubExpressionClose, false, iChar)); parseState = ParsingState.Expression2; } else { throw new ParseException("Encountered " + cRightRoundBracket + " without matching " + cLeftRoundBracket + " at position " + iChar); } iChar++; } break; // -------------------- -------------------- -------------------- // Decide: Leading Name for Function, Table, R1C1Reference, Reference Qualifier or Cell Reference case ParsingState.LeadingName1: { if (currentTokenStr.Length > 0 && c == cLeftRoundBracket) { parseState = ParsingState.FunctionX; } else if (currentTokenStr.Length > 0 && c == cLeftBracket) { parseState = ParsingState.SquareBracketOpen; } else if (currentTokenStr.Length > 0 && c == '!') { parseState = ParsingState.ReferenceQualifier; } else if (c == Char.MinValue || Char.IsWhiteSpace(c) || strLNSeparators.Contains(c)) { parseState = ParsingState.LeadingNameE; } else { currentTokenStr += c; iChar++; } } break; case ParsingState.LeadingName2: { if (c == ''') { parseState = ParsingState.LeadingName3; } else { currentTokenStr += c; } iChar++; } break; case ParsingState.LeadingName3: { if (c == ''') { parseState = ParsingState.LeadingName2; currentTokenStr += c; iChar++; } else { parseState = ParsingState.LeadingName1; } } break; case ParsingState.LeadingNameE: { if (currentTokenStr.Length > 0) { tokens.Add(new Token(currentTokenStr, (currentTokenStr.Equals(strBooleanTrue) || currentTokenStr.Equals(strBooleanFalse)) ? TokenType.Bool : TokenType.Reference, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; } parseState = ParsingState.Expression2; } break; // -------------------- -------------------- -------------------- // Function case ParsingState.FunctionX: { if (currentTokenStr.StartsWith("@")) { tokenStack.Push(new Token(currentTokenStr.Substring(1), TokenType.AtFunctionOpen, inQuote, tokenIndex)); } else { tokenStack.Push(new Token(currentTokenStr, TokenType.FunctionOpen, inQuote, tokenIndex)); } tokens.Add(tokenStack.Peek()); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; iChar++; } break; // -------------------- -------------------- -------------------- // Decide: Table or R1C1 Reference case ParsingState.SquareBracketOpen: { if (currentTokenStr.Equals("R") || currentTokenStr.Equals("C")) { parseState = ParsingState.R1C1Reference; } else { tokenStack.Push(new Token(currentTokenStr, TokenType.TableOpen, inQuote, tokenIndex)); tokens.Add(tokenStack.Peek()); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Table1; iChar++; } } break; // -------------------- -------------------- -------------------- // Table case ParsingState.Table1: { tokenIndex = iChar; if (Char.IsWhiteSpace(c)) { parseReturnState = ParsingState.Table1; parseState = ParsingState.WhiteSpace; } else if (c == cLeftBracket) { inQuote = true; iChar++; parseState = ParsingState.Table6; } else { parseState = ParsingState.Table5; } } break; case ParsingState.Table2: { tokenIndex = iChar; if (Char.IsWhiteSpace(c)) { parseReturnState = ParsingState.Table2; parseState = ParsingState.WhiteSpace; } else if (c == cListSeparator) { parseState = ParsingState.Table3; } else if (c == ':') { parseState = ParsingState.Table4; } else if (c == cRightBracket) { parseState = ParsingState.TableE; } else { parseState = ParsingState.Table5; } } break; case ParsingState.Table3: { iChar++; tokens.Add(new Token(Char.ToString(c), TokenType.TableItemSeparator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Table1; } break; case ParsingState.Table4: { iChar++; tokens.Add(new Token(Char.ToString(c), TokenType.TableColumnSeparator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Table1; } break; case ParsingState.Table5: { if (c == cRightBracket) { tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection : TokenType.TableColumn, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.TableE; } else if (c == ''') { iChar++; currentTokenStr += c; parseReturnState = ParsingState.Table5; parseState = ParsingState.TableQ; } else { currentTokenStr += c; iChar++; } } break; case ParsingState.Table6: { if (c == cRightBracket) { iChar++; tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection : TokenType.TableColumn, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Table2; } else if (c == ''') { iChar++; currentTokenStr += c; parseReturnState = ParsingState.Table6; parseState = ParsingState.TableQ; } else { currentTokenStr += c; iChar++; } } break; case ParsingState.TableQ: { currentTokenStr += c; iChar++; parseState = parseReturnState; } break; case ParsingState.TableE: { Token prevToken = tokenStack.Pop(); if (prevToken.TokenType == TokenType.TableOpen) { tokens.Add(new Token(Char.ToString(c), TokenType.TableClose, false, iChar)); parseState = ParsingState.Expression2; iChar++; } else { throw new ParseException("Encountered " + cRightBracket + " without matching " + cLeftBracket + " at position " + iChar); } } break; // -------------------- -------------------- -------------------- // Text case ParsingState.Text1: { if (c == cDoubleQuote) { parseState = ParsingState.Text2; } else { currentTokenStr += c; } iChar++; } break; case ParsingState.Text2: { if (c == cDoubleQuote) { parseState = ParsingState.Text1; currentTokenStr += c; iChar++; } else { tokens.Add(new Token(currentTokenStr, TokenType.Text, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = parseReturnState; } } break; // -------------------- -------------------- -------------------- // Number case ParsingState.Number1: case ParsingState.Number2: { if (Char.IsDigit(c)) { currentTokenStr += c; iChar++; } else if (c == cDecimalSeparator && parseState == ParsingState.Number1) { currentTokenStr += c; parseState = ParsingState.Number2; iChar++; } else if (c == 'E') { currentTokenStr += c; parseState = ParsingState.Number3; iChar++; } else { parseState = ParsingState.NumberE; } } break; case ParsingState.Number3: { if (c == '+' || c == '-') { currentTokenStr += c; parseState = ParsingState.Number4; iChar++; } else { throw new ParseException("Expected + or - at position " + iChar); } } break; case ParsingState.Number4: { if (Char.IsDigit(c)) { currentTokenStr += c; iChar++; } else { parseState = ParsingState.NumberE; } } break; case ParsingState.NumberE: { tokens.Add(new Token(currentTokenStr, TokenType.Number, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = parseReturnState; } break; // -------------------- -------------------- -------------------- // Error case ParsingState.ErrorX: { currentTokenStr += c; iChar++; if (strsError.IndexOf("\t" + currentTokenStr) >= 0) { if (strsError.IndexOf("\t" + currentTokenStr + "\t") >= 0) { tokens.Add(new Token(currentTokenStr, TokenType.ErrorText, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = parseReturnState; } } else { throw new ParseException("Expected Error Constant at position " + tokenIndex); } } break; // -------------------- -------------------- -------------------- case ParsingState.ReferenceQualifier: { tokens.Add(new Token(currentTokenStr, TokenType.ReferenceQualifier, inQuote, tokenIndex)); tokens.Add(new Token(Char.ToString(c), TokenType.ExternalReferenceOperator, false, iChar)); iChar++; currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; case ParsingState.MinusSign: { if (Char.IsDigit(c)) { parseState = ParsingState.Number1; } else { parseState = ParsingState.PrefixOperator; iChar--; } } break; case ParsingState.PrefixOperator: { iChar++; tokens.Add(new Token(Char.ToString(c), TokenType.UnaryOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; // -------------------- -------------------- -------------------- // ListSeparator can be either function parameter separator, or a union operator case ParsingState.ListSeparator: { bool isParam = false; if (tokenStack.Count() > 0) { if (tokenStack.Peek().TokenType == TokenType.FunctionOpen || tokenStack.Peek().TokenType == TokenType.AtFunctionOpen) { parseState = ParsingState.ParameterSeparator; isParam = true; } } if (!isParam) { parseState = ParsingState.RangeOperator; } } break; case ParsingState.RangeOperator: { currentTokenStr = Char.ToString(c); iChar++; tokens.Add(new Token(currentTokenStr, TokenType.RangeOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; // -------------------- -------------------- -------------------- case ParsingState.ArithmeticOperator: { currentTokenStr = Char.ToString(c); iChar++; tokens.Add(new Token(currentTokenStr, TokenType.ArithmeticOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; // -------------------- -------------------- -------------------- case ParsingState.ComparisonOperator1: { currentTokenStr = Char.ToString(c); iChar++; parseState = (c == '<' || c == '>') ? ParsingState.ComparisonOperator2 : ParsingState.ComparisonOperatorE; } break; case ParsingState.ComparisonOperator2: { if (c == '=' || (currentTokenStr.Equals("<") && c == '>')) { currentTokenStr += c; iChar++; } parseState = ParsingState.ComparisonOperatorE; } break; case ParsingState.ComparisonOperatorE: { tokens.Add(new Token(currentTokenStr, TokenType.ComparisonOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; // -------------------- -------------------- -------------------- case ParsingState.TextOperator: { currentTokenStr = Char.ToString(c); iChar++; tokens.Add(new Token(currentTokenStr, TokenType.TextOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; // -------------------- -------------------- -------------------- case ParsingState.PostfixOperator: { currentTokenStr = Char.ToString(c); iChar++; tokens.Add(new Token(currentTokenStr, TokenType.PostfixOperator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression2; } break; // -------------------- -------------------- -------------------- case ParsingState.ParameterSeparator: { currentTokenStr = Char.ToString(c); iChar++; tokens.Add(new Token(currentTokenStr, TokenType.ParameterSeparator, inQuote, tokenIndex)); currentTokenStr = ""; inQuote = false; parseState = ParsingState.Expression1; } break; default: break; } } // todo: error if token stack not empty // post processing // 1. Scan and detect !, and join if the 3 tokens prior are CellReference-Colon-CellReference // 2. Scan and detect CellReference-Colon-CellReference, because they should be joined into an AreaReference return tokens; }
}
Tom Scott says:

December 22, 2014 at 2:46 pm

Thanks Rob,

I modified your code to help me parse out some very long formulas. It indenting the nested if statements, replaced local cell references with the text in a header row, and external references with the values from the external sheet.

Doing so made the formulas read almost as if they were plain language and saved me a huge amount of time and frustration.

Thanks again for the code – it was a huge help.
Jack says:

August 1, 2015 at 1:20 pm

Rob, could you repost the link please? It seems to be dead for me. Thanks!
Gareth says:

August 3, 2015 at 8:25 pm

@Jack, if you want a finished editor as opposed to the source-code, you can try the Formula Explorer in FormulaDesk. (Click my name for a link to the website).
Jack says:

August 3, 2015 at 8:30 pm

@Gareth: thanks, but it is the code that I am after.
Dick Kusleika says:

August 4, 2015 at 7:27 am

I know that Chris Greaves (http://www.chrisgreaves.com/Stain/index.htm) was working on some tokenizing. It maybe worth a look.
Jack says:

August 4, 2015 at 8:07 am

Sorry, I should have explained more. I’m working with some code that says it was adapted from Rob’s addin here, and was hoping to get a copy of the original for comparison. Thanks though Dick, I appreciate it.
Jaime Diaz says:

September 20, 2016 at 1:12 pm

For anyone still looking for this code, you can thank your lucky starts. It is available from the Wayback Machine at the following URL:
http://web.archive.org/web/20141226200643/http://vangelder.orconhosting.net.nz/excel/audxl.html

Enjoy, and thanks Rob for the great leg work, just remember to take in to account the memory leaking problem and resolve it accordingly.
Jason says:

September 8, 2017 at 9:51 pm

Tokenizer must be something very useful for me but,
I am located in Shanghai, China and can not access all the link to the Zip file. Is there any way I can get a copy?
Jason says:

June 25, 2020 at 11:03 am

Tony, do you have an updated C# version that you could post? Simply copying and pasting the text here has all kinds of formatting issues. Would be great if you could post a link to a source file that can be downloaded so that the code does not get mangled.

Thanks for your help!
T says:

June 25, 2020 at 1:08 pm

Does anyone have a copy of ExcelFormulas.zip originally linked in this article? The AudXL addin from the website archive is the older version from 2005.

Here’s an attempt to clean up the C# code

class XLTokenizer {
	private enum ParsingState {
		Expression1,
		Expression2,
		LeadingName1,
		LeadingName2,
		LeadingName3,
		LeadingNameE,
		WhiteSpace,
		Text1,
		Text2,
		Number1,
		Number2,
		Number3,
		Number4,
		NumberE,
		Bool,
		ErrorX,
		MinusSign,
		// for ambiguity between unary minus and sign
		PrefixOperator,
		ArithmeticOperator,
		ComparisonOperator1,
		ComparisonOperator2,
		ComparisonOperatorE,
		TextOperator,
		PostfixOperator,
		RangeOperator,
		ReferenceQualifier,
		ListSeparator,
		// for ambiguity between function parameter separator and reference union operator
		ArrayRowSeparator,
		ArrayColumnSeparator,
		FunctionX,
		ParameterSeparator,
		SubExpression,
		BracketClose,
		ArrayOpen,
		ArrayConstant1,
		ArrayConstant2,
		ArrayClose,
		SquareBracketOpen,
		Table1,
		Table2,
		Table3,
		Table4,
		Table5,
		Table6,
		TableQ,
		TableE,
		R1C1Reference // todo: implement R1C1Reference
	}

	public enum TokenType {
		Text,
		Number,
		Bool,
		ErrorText,
		Reference,
		WhiteSpace,
		UnaryOperator,
		ArithmeticOperator,
		ComparisonOperator,
		TextOperator,
		RangeOperator,
		ReferenceQualifier,
		ExternalReferenceOperator,
		PostfixOperator,
		FunctionOpen,
		AtFunctionOpen,
		ParameterSeparator,
		FunctionClose,
		SubExpressionOpen,
		SubExpressionClose,
		ArrayOpen,
		ArrayRowSeparator,
		ArrayColumnSeparator,
		ArrayClose,
		TableOpen,
		TableSection,
		TableColumn,
		TableItemSeparator,
		TableColumnSeparator,
		TableClose
	}

	public class Token {
		public String TokenText {
			get;
			set;
		}
		public TokenType TokenType {
			get;
			set;
		}
		public bool Quoted {
			get;
			set;
		}
		public int FormulaIndex {
			get;
			set;
		}

		public Token(String tokenText, TokenType tokenType, bool quoted, int formulaIndex) {
			TokenText = tokenText;
			TokenType = tokenType;
			Quoted = quoted;
			FormulaIndex = formulaIndex;
		}
	}

	[Serializable()]
	public class ParseException: System.Exception {
		public ParseException() : base() {}
		public ParseException(string message) : base(message) {}
		public ParseException(string message, System.Exception inner) : base(message, inner) {}
		protected ParseException(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) {}
	}

	private static char cDecimalSeparator = '.';
	private static char cListSeparator = ',';
	private static char cArrayRowSeparator = ';';
	private static char cArrayColumnSeparator = ',';
	private static char cLeftBrace = '{';
	private static char cRightBrace = '}';
	private static char cLeftBracket = '[';
	private static char cRightBracket = ']';
	private static char cLeftRoundBracket = '(';
	private static char cRightRoundBracket = ')';
	private static char cDoubleQuote = '\"';

	private static String strBooleanTrue = "TRUE";
	private static String strBooleanFalse = "FALSE";

	private static String strErrorRef = "#REF!";
	private static String strErrorDiv0 = "#DIV/0!";
	private static String strErrorNA = "#N/A";
	private static String strErrorName = "#NAME?";
	private static String strErrorNull = "#NULL!";
	private static String strErrorNum = "#NUM!";
	private static String strErrorValue = "#VALUE!";
	private static String strErrorGettingData = "#GETTING_DATA";

	private static String strLNSeparators = "+-*/^%=<>&:" + cRightRoundBracket + cRightBrace + cListSeparator;
	private static String strArithmeticOps = "+-*/^";
	private static String strComparisonOps = "=<>";

	private static String strsBoolean = "\t" + strBooleanTrue + "\t" + strBooleanFalse + "\t";
	private static String strsError = "\t" + strErrorRef + "\t" + strErrorDiv0 + "\t" + strErrorNA + "\t" + strErrorName + "\t" + strErrorNull + "\t" + strErrorNum + "\t" + strErrorValue + "\t" + strErrorGettingData + "\t";

	public static List testStrs = new List() {
		"=1\n\n \n 5",
		"=(A1 ,A5)",
		"=\"123\"\"456\"\"789\"",
		"=3.1E-24-2.1E-24",
		"=[MyBook.xlsx]Sheet1!$A$1",
		"=Func(,\"abc\",,)",
		"=IF(P5=1.0,\"NA\",IF(P5=2.0,\"A\",IF(P5=3.0,\"B\",IF(P5=4.0,\"C\",IF(P5=5.0,\"D\",IF(P5=6.0,\"E\",IF(P5=7.0,\"F\",IF(P5=8.0,\"G\"))))))))",
		"=SUM((A:A A1:B1))",
		"='MyCOM.AddIn.Function_Name'(,\"abc\",,)",
		"=IF(\"a\"={\"a\",\"b\";\"c\",#N/A;-1,TRUE}, \"yes\", \"no\") & \" more \"\"test\"\" text\"",
		"=+ AName- (-+-+-2^6) = {\"A\",\"B\"} + @SUM(R1C1) + (@ERROR.TYPE(#VALUE!) = 2)",
		"=SUM('Sheet-1'!A1:'Sheet-1'!A10)",
		"=Table1[#Headers]",
		"=Table1[[a]:[b]]",
		"=Table1[ [#Headers], [#Data], [a]:[b] ]",
		"=IF(((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24< =40,((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24,40)",
		"=Table1[[#All],[b'[]]",
		"=IF(L40>65,AQ26,ROUND((L12*IF(OR(AND(L40>=55, L41>=20),AND(L41>=20,$C$11=\"YES\")),L45,L44))+(L15 *IF(OR(AND(L40>=55,L41>=20),AND(L41>=20,$C$11=\"YES\")), L46,L44)),0))"
	};

	public static void RunTests() {
		foreach(String formulaStr in testStrs) {
			List tokens = ParseFormula(formulaStr);
			if (tokens != null) {
				String reformedStr = ReformFormula(tokens);
				if (!reformedStr.Equals(formulaStr)) {
					throw new ParseException("Failed to parse and reform \"" + formulaStr + "\"");
				}
			}
		}
	}

	public static String ReformFormula(List tokens) {
		String formulaStr = "=";
		foreach(Token token in tokens) {
			if (token.TokenType == TokenType.AtFunctionOpen) formulaStr += "@";

			if (token.TokenType == TokenType.Text) {
				formulaStr += cDoubleQuote;
				foreach(Char c in token.TokenText) {
					if (c == cDoubleQuote) formulaStr += cDoubleQuote;
					formulaStr += c;
				}
				formulaStr += cDoubleQuote;
			}
			else {
				if (token.Quoted) {
					if (token.TokenType == TokenType.TableColumn || token.TokenType == TokenType.TableSection) {
						formulaStr += "[" + token.TokenText + "]";
					}
					else {
						formulaStr += ''';
foreach (Char c in token.TokenText)
{
if (c == ''')
						formulaStr += ''';
formulaStr += c;
}
formulaStr += ''';
					}
				}
				else formulaStr += token.TokenText;

				if (token.TokenType == TokenType.FunctionOpen || token.TokenType == TokenType.AtFunctionOpen) formulaStr += cLeftRoundBracket;
				else if (token.TokenType == TokenType.TableOpen) formulaStr += cLeftBracket;
			}

		}
		return formulaStr;
	}

	public static List ParseFormula(String strFormula) {
		int lngFormulaLen = strFormula.Length;
		if (lngFormulaLen < =1) {
			throw new ParseException("Formula too short");
		}
		if (!strFormula.StartsWith("=")) {
			throw new ParseException("Formula doesn't start with '='");
		}
		List tokens = new List();
		Stack tokenStack = new Stack();

		ParsingState parseState = ParsingState.Expression1;
		ParsingState parseReturnState = parseState;

		String currentTokenStr = "";
		int tokenIndex = 0;
		bool continueParse = true;
		int iChar = 1;
		bool inQuote = false;
		while (continueParse) {

			char c = (iChar < lngFormulaLen) ? strFormula[iChar] : Char.MinValue;
			switch (parseState) {
			case ParsingState.Expression1:
				{
					if (Char.IsWhiteSpace(c)) {
						parseReturnState = parseState;
						parseState = ParsingState.WhiteSpace;
						tokenIndex = iChar;
					} else if (c == cDoubleQuote) {
						parseReturnState = ParsingState.Expression2;
						parseState = ParsingState.Text1;
						tokenIndex = iChar;
						iChar++;
					} else if (Char.IsDigit(c)) {
						parseReturnState = ParsingState.Expression2;
						parseState = ParsingState.Number1;
						tokenIndex = iChar;
					} else if (c == '-') {
						currentTokenStr = Char.ToString(c);
						parseReturnState = ParsingState.Expression2;
						parseState = ParsingState.MinusSign;
						tokenIndex = iChar;
						iChar++;
					} else if (c == '+') {
						parseState = ParsingState.PrefixOperator;
						tokenIndex = iChar;
					} else if (c == cLeftRoundBracket) {
						parseState = ParsingState.SubExpression;
					} else if (c == cLeftBrace) {
						parseState = ParsingState.ArrayOpen;
					} else if (c == ''') { parseState = ParsingState.LeadingName2; inQuote = true; tokenIndex = iChar; iChar++; } else if (c == '#') { parseReturnState = ParsingState.Expression2; parseState = ParsingState.ErrorX; tokenIndex = iChar; } else { parseState = ParsingState.LeadingName1; tokenIndex = iChar; } } break; // -------------------- -------------------- -------------------- case ParsingState.Expression2: { if (c == Char.MinValue) { continueParse = false; } else if (Char.IsWhiteSpace(c)) { parseReturnState = parseState; parseState = ParsingState.WhiteSpace; tokenIndex = iChar; } else if (c == cRightRoundBracket) { parseState = ParsingState.BracketClose; } else if (strArithmeticOps.Contains(c)) { parseState = ParsingState.ArithmeticOperator; tokenIndex = iChar; } else if (strComparisonOps.Contains(c)) { parseState = ParsingState.ComparisonOperator1; tokenIndex = iChar; } else if (c == ' & ') { parseState = ParsingState.TextOperator; tokenIndex = iChar; } else if (c == ' % ') { parseState = ParsingState.PostfixOperator; tokenIndex = iChar; } else if (c == ': ') { parseState = ParsingState.RangeOperator; tokenIndex = iChar; } else if (c == cListSeparator) { parseState = ParsingState.ListSeparator; tokenIndex = iChar; } else { // Check if whitespace is actually union operator Token objToken = tokens[tokens.Count - 1]; bool isError = true; if (objToken.TokenType == TokenType.WhiteSpace) { int offset = objToken.TokenText.IndexOf(''); if (offset >= 0)
{
currentTokenStr = objToken.TokenText.Substring(offset + 1);
if (offset == 0)
{
objToken.TokenType = TokenType.RangeOperator;
objToken.TokenText = " ";
}
else
{
objToken.TokenText = objToken.TokenText.Substring(0, offset);
tokens.Add(new Token(" ", TokenType.RangeOperator, false, objToken.FormulaIndex + offset - 1));
}

if (currentTokenStr.Length > 0)
{
tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, objToken.FormulaIndex + offset));
currentTokenStr = "";
inQuote = false;
}

parseState = ParsingState.Expression1;
isError = false;
}
}
else if (objToken.TokenType == TokenType.ErrorText && objToken.TokenText.Equals(strErrorRef))
{
parseState = ParsingState.Expression1;
isError = false;
}
if (isError)
{
throw new ParseException("Expected Operator but got " + c + " at position " + iChar);
}
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayConstant1:
{
if (c == cDoubleQuote)
{
parseReturnState = ParsingState.ArrayConstant2;
parseState = ParsingState.Text1;
tokenIndex = iChar;
iChar++;
}
else if (Char.IsDigit(c))
{
parseReturnState = ParsingState.ArrayConstant2;
parseState = ParsingState.Number1;
tokenIndex = iChar;
}
else if (c == ' - ')
{
currentTokenStr = Char.ToString(c);
parseReturnState = ParsingState.ArrayConstant2;
parseState = ParsingState.Number1;
tokenIndex = iChar;
iChar++;
}
else if (c == '#')
{
parseReturnState = ParsingState.ArrayConstant2;
parseState = ParsingState.ErrorX;
tokenIndex = iChar;
}
else
{
parseReturnState = ParsingState.ArrayConstant2;
parseState = ParsingState.Bool;
tokenIndex = iChar;
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayConstant2:
{
if (c == cRightBrace)
{
parseState = ParsingState.ArrayClose;
}
else if (c == cArrayRowSeparator)
{
parseState = ParsingState.ArrayRowSeparator;
tokenIndex = iChar;
}
else if (c == cArrayColumnSeparator)
{
parseState = ParsingState.ArrayColumnSeparator;
tokenIndex = iChar;
}
else
{
throw new ParseException("Expected " + cRightBrace + " " + cArrayRowSeparator + " " + ParsingState.ArrayColumnSeparator + " but got " + c + " at position " + iChar);
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayOpen:
{
tokenStack.Push(new Token(Char.ToString(c), TokenType.ArrayOpen, false, iChar));
tokens.Add(tokenStack.Peek());
parseState = ParsingState.ArrayConstant1;
iChar++;
}
break;

case ParsingState.ArrayClose:
{
Token prevToken = tokenStack.Pop();
if (prevToken.TokenType == TokenType.ArrayOpen)
{
tokens.Add(new Token(Char.ToString(c), TokenType.ArrayClose, false, iChar));
parseState = ParsingState.Expression2;
iChar++;
}
else
{
throw new ParseException("Encountered " + cRightBrace + " without matching " + cLeftBrace + " at position " + iChar);
}
}
break;

case ParsingState.ArrayColumnSeparator:
{
iChar++;
tokens.Add(new Token(Char.ToString(c), TokenType.ArrayColumnSeparator, false, tokenIndex));
parseState = ParsingState.ArrayConstant1;
}
break;

case ParsingState.ArrayRowSeparator:
{
iChar++;
tokens.Add(new Token(Char.ToString(c), TokenType.ArrayRowSeparator, false, tokenIndex));
parseState = ParsingState.ArrayConstant1;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.Bool:
{
currentTokenStr += c;
iChar++;

if (strsBoolean.IndexOf("\t" + currentTokenStr) >= 0)
{
if (strsBoolean.IndexOf("\t" + currentTokenStr + "\t") >= 0)
{
tokens.Add(new Token(currentTokenStr, TokenType.Bool, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = parseReturnState;
}
}
else
{
throw new ParseException("Expected Boolean Constant at position " + tokenIndex);
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.WhiteSpace:
{
if (Char.IsWhiteSpace(c))
{
currentTokenStr += c;
iChar++;
}
else
{
tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = parseReturnState;
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.SubExpression:
{
tokenStack.Push(new Token(Char.ToString(c), TokenType.SubExpressionOpen, inQuote, iChar));
tokens.Add(tokenStack.Peek());
parseState = ParsingState.Expression1;
iChar++;
}
break;

case ParsingState.BracketClose:
{
Token prevToken = tokenStack.Pop();
if (prevToken.TokenType == TokenType.FunctionOpen || prevToken.TokenType == TokenType.AtFunctionOpen)
{
tokens.Add(new Token(Char.ToString(c), TokenType.FunctionClose, false, iChar));
parseState = ParsingState.Expression2;
}
else if (prevToken.TokenType == TokenType.SubExpressionOpen)
{
tokens.Add(new Token(Char.ToString(c), TokenType.SubExpressionClose, false, iChar));
parseState = ParsingState.Expression2;
}
else
{
throw new ParseException("Encountered " + cRightRoundBracket + " without matching " + cLeftRoundBracket + " at position " + iChar);
}
iChar++;
}
break;

// -------------------- -------------------- --------------------
// Decide: Leading Name for Function, Table, R1C1Reference, Reference Qualifier or Cell Reference

case ParsingState.LeadingName1:
{
if (currentTokenStr.Length > 0 && c == cLeftRoundBracket)
{
parseState = ParsingState.FunctionX;
}
else if (currentTokenStr.Length > 0 && c == cLeftBracket)
{
parseState = ParsingState.SquareBracketOpen;
}
else if (currentTokenStr.Length > 0 && c == ' ! ')
{
parseState = ParsingState.ReferenceQualifier;
}
else if (c == Char.MinValue || Char.IsWhiteSpace(c) || strLNSeparators.Contains(c))
{
parseState = ParsingState.LeadingNameE;
}
else
{
currentTokenStr += c;
iChar++;
}
}
break;

case ParsingState.LeadingName2:
{
if (c == ''') {
						parseState = ParsingState.LeadingName3;
					}
					else {
						currentTokenStr += c;
					}
					iChar++;
				}
				break;

			case ParsingState.LeadingName3:
				{
					if (c == ''')
{
parseState = ParsingState.LeadingName2;
currentTokenStr += c;
iChar++;
}
else
{
parseState = ParsingState.LeadingName1;
}
}
break;

case ParsingState.LeadingNameE:
{
if (currentTokenStr.Length > 0)
{
tokens.Add(new Token(currentTokenStr, (currentTokenStr.Equals(strBooleanTrue) || currentTokenStr.Equals(strBooleanFalse)) ? TokenType.Bool : TokenType.Reference, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
}
parseState = ParsingState.Expression2;
}
break;

// -------------------- -------------------- --------------------
// Function

case ParsingState.FunctionX:
{
if (currentTokenStr.StartsWith("@"))
{
tokenStack.Push(new Token(currentTokenStr.Substring(1), TokenType.AtFunctionOpen, inQuote, tokenIndex));
}
else
{
tokenStack.Push(new Token(currentTokenStr, TokenType.FunctionOpen, inQuote, tokenIndex));
}
tokens.Add(tokenStack.Peek());
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
iChar++;
}
break;

// -------------------- -------------------- --------------------
// Decide: Table or R1C1 Reference

case ParsingState.SquareBracketOpen:
{
if (currentTokenStr.Equals("R") || currentTokenStr.Equals("C"))
{
parseState = ParsingState.R1C1Reference;
}
else
{
tokenStack.Push(new Token(currentTokenStr, TokenType.TableOpen, inQuote, tokenIndex));
tokens.Add(tokenStack.Peek());
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Table1;
iChar++;
}
}
break;

// -------------------- -------------------- --------------------
// Table

case ParsingState.Table1:
{
tokenIndex = iChar;
if (Char.IsWhiteSpace(c))
{
parseReturnState = ParsingState.Table1;
parseState = ParsingState.WhiteSpace;
}
else if (c == cLeftBracket)
{
inQuote = true;
iChar++;
parseState = ParsingState.Table6;
}
else
{
parseState = ParsingState.Table5;
}
}
break;

case ParsingState.Table2:
{
tokenIndex = iChar;
if (Char.IsWhiteSpace(c))
{
parseReturnState = ParsingState.Table2;
parseState = ParsingState.WhiteSpace;
}
else if (c == cListSeparator)
{
parseState = ParsingState.Table3;
}
else if (c == ': ')
{
parseState = ParsingState.Table4;
}
else if (c == cRightBracket)
{
parseState = ParsingState.TableE;
}
else
{
parseState = ParsingState.Table5;
}
}
break;

case ParsingState.Table3:
{
iChar++;
tokens.Add(new Token(Char.ToString(c), TokenType.TableItemSeparator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Table1;
}
break;

case ParsingState.Table4:
{
iChar++;
tokens.Add(new Token(Char.ToString(c), TokenType.TableColumnSeparator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Table1;
}
break;

case ParsingState.Table5:
{
if (c == cRightBracket)
{
tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection : TokenType.TableColumn, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.TableE;
}
else if (c == ''') {
						iChar++;
						currentTokenStr += c;
						parseReturnState = ParsingState.Table5;
						parseState = ParsingState.TableQ;
					}
					else {
						currentTokenStr += c;
						iChar++;
					}
				}
				break;

			case ParsingState.Table6:
				{
					if (c == cRightBracket) {
						iChar++;
						tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection: TokenType.TableColumn, inQuote, tokenIndex));
						currentTokenStr = "";
						inQuote = false;
						parseState = ParsingState.Table2;
					}
					else if (c == ''')
{
iChar++;
currentTokenStr += c;
parseReturnState = ParsingState.Table6;
parseState = ParsingState.TableQ;
}
else
{
currentTokenStr += c;
iChar++;
}
}
break;

case ParsingState.TableQ:
{
currentTokenStr += c;
iChar++;
parseState = parseReturnState;
}
break;

case ParsingState.TableE:
{
Token prevToken = tokenStack.Pop();
if (prevToken.TokenType == TokenType.TableOpen)
{
tokens.Add(new Token(Char.ToString(c), TokenType.TableClose, false, iChar));
parseState = ParsingState.Expression2;
iChar++;
}
else
{
throw new ParseException("Encountered " + cRightBracket + " without matching " + cLeftBracket + " at position " + iChar);
}
}
break;

// -------------------- -------------------- --------------------
// Text

case ParsingState.Text1:
{
if (c == cDoubleQuote)
{
parseState = ParsingState.Text2;
}
else
{
currentTokenStr += c;
}
iChar++;
}
break;

case ParsingState.Text2:
{
if (c == cDoubleQuote)
{
parseState = ParsingState.Text1;
currentTokenStr += c;
iChar++;
}
else
{
tokens.Add(new Token(currentTokenStr, TokenType.Text, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = parseReturnState;
}
}
break;

// -------------------- -------------------- --------------------
// Number

case ParsingState.Number1:
case ParsingState.Number2:
{
if (Char.IsDigit(c))
{
currentTokenStr += c;
iChar++;
}
else if (c == cDecimalSeparator && parseState == ParsingState.Number1)
{
currentTokenStr += c;
parseState = ParsingState.Number2;
iChar++;
}
else if (c == '
					E ')
{
currentTokenStr += c;
parseState = ParsingState.Number3;
iChar++;
}
else
{
parseState = ParsingState.NumberE;
}
}
break;

case ParsingState.Number3:
{
if (c == ' + ' || c == ' - ')
{
currentTokenStr += c;
parseState = ParsingState.Number4;
iChar++;
}
else
{
throw new ParseException("Expected + or - at position " + iChar);
}
}
break;

case ParsingState.Number4:
{
if (Char.IsDigit(c))
{
currentTokenStr += c;
iChar++;
}
else
{
parseState = ParsingState.NumberE;
}
}
break;

case ParsingState.NumberE:
{
tokens.Add(new Token(currentTokenStr, TokenType.Number, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = parseReturnState;
}
break;

// -------------------- -------------------- --------------------
// Error

case ParsingState.ErrorX:
{
currentTokenStr += c;
iChar++;

if (strsError.IndexOf("\t" + currentTokenStr) >= 0)
{
if (strsError.IndexOf("\t" + currentTokenStr + "\t") >= 0)
{
tokens.Add(new Token(currentTokenStr, TokenType.ErrorText, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = parseReturnState;
}
}
else
{
throw new ParseException("Expected Error Constant at position " + tokenIndex);
}
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ReferenceQualifier:
{
tokens.Add(new Token(currentTokenStr, TokenType.ReferenceQualifier, inQuote, tokenIndex));
tokens.Add(new Token(Char.ToString(c), TokenType.ExternalReferenceOperator, false, iChar));
iChar++;
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

case ParsingState.MinusSign:
{
if (Char.IsDigit(c))
{
parseState = ParsingState.Number1;
}
else
{
parseState = ParsingState.PrefixOperator;
iChar--;
}
}
break;

case ParsingState.PrefixOperator:
{
iChar++;
tokens.Add(new Token(Char.ToString(c), TokenType.UnaryOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

// -------------------- -------------------- --------------------
// ListSeparator can be either function parameter separator, or a union operator

case ParsingState.ListSeparator:
{
bool isParam = false;
if (tokenStack.Count() > 0)
{
if (tokenStack.Peek().TokenType == TokenType.FunctionOpen || tokenStack.Peek().TokenType == TokenType.AtFunctionOpen)
{
parseState = ParsingState.ParameterSeparator;
isParam = true;
}
}
if (!isParam)
{
parseState = ParsingState.RangeOperator;
}
}
break;

case ParsingState.RangeOperator:
{
currentTokenStr = Char.ToString(c);
iChar++;
tokens.Add(new Token(currentTokenStr, TokenType.RangeOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ArithmeticOperator:
{
currentTokenStr = Char.ToString(c);
iChar++;
tokens.Add(new Token(currentTokenStr, TokenType.ArithmeticOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ComparisonOperator1:
{
currentTokenStr = Char.ToString(c);
iChar++;
parseState = (c == ' < ' || c == ' > ') ? ParsingState.ComparisonOperator2 : ParsingState.ComparisonOperatorE;
}
break;

case ParsingState.ComparisonOperator2:
{
if (c == ' = ' || (currentTokenStr.Equals("< ") && c == ' > '))
{
currentTokenStr += c;
iChar++;
}
parseState = ParsingState.ComparisonOperatorE;
}
break;

case ParsingState.ComparisonOperatorE:
{
tokens.Add(new Token(currentTokenStr, TokenType.ComparisonOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.TextOperator:
{
currentTokenStr = Char.ToString(c);
iChar++;
tokens.Add(new Token(currentTokenStr, TokenType.TextOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.PostfixOperator:
{
currentTokenStr = Char.ToString(c);
iChar++;
tokens.Add(new Token(currentTokenStr, TokenType.PostfixOperator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression2;
}
break;

// -------------------- -------------------- --------------------

case ParsingState.ParameterSeparator:
{
currentTokenStr = Char.ToString(c);
iChar++;
tokens.Add(new Token(currentTokenStr, TokenType.ParameterSeparator, inQuote, tokenIndex));
currentTokenStr = "";
inQuote = false;
parseState = ParsingState.Expression1;
}
break;

default:
break;
}

}

// todo: error if token stack not empty

// post processing
// 1. Scan and detect !, and join if the 3 tokens prior are CellReference-Colon-CellReference
// 2. Scan and detect CellReference-Colon-CellReference, because they should be joined into an AreaReference

return tokens;
}

}'

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

class XLTokenizer {

private enum ParsingState {

Expression1,

Expression2,

LeadingName1,

LeadingName2,

LeadingName3,

LeadingNameE,

WhiteSpace,

Text1,

Text2,

Number1,

Number2,

Number3,

Number4,

NumberE,

Bool,

ErrorX,

MinusSign,

// for ambiguity between unary minus and sign

PrefixOperator,

ArithmeticOperator,

ComparisonOperator1,

ComparisonOperator2,

ComparisonOperatorE,

TextOperator,

PostfixOperator,

RangeOperator,

ReferenceQualifier,

ListSeparator,

// for ambiguity between function parameter separator and reference union operator

ArrayRowSeparator,

ArrayColumnSeparator,

FunctionX,

ParameterSeparator,

SubExpression,

BracketClose,

ArrayOpen,

ArrayConstant1,

ArrayConstant2,

ArrayClose,

SquareBracketOpen,

Table1,

Table2,

Table3,

Table4,

Table5,

Table6,

TableQ,

TableE,

R1C1Reference // todo: implement R1C1Reference

}

public enum TokenType {

Text,

Number,

Bool,

ErrorText,

Reference,

WhiteSpace,

UnaryOperator,

ArithmeticOperator,

ComparisonOperator,

TextOperator,

RangeOperator,

ReferenceQualifier,

ExternalReferenceOperator,

PostfixOperator,

FunctionOpen,

AtFunctionOpen,

ParameterSeparator,

FunctionClose,

SubExpressionOpen,

SubExpressionClose,

ArrayOpen,

ArrayRowSeparator,

ArrayColumnSeparator,

ArrayClose,

TableOpen,

TableSection,

TableColumn,

TableItemSeparator,

TableColumnSeparator,

TableClose

}

public class Token {

public String TokenText {

get;

set;

}

public TokenType TokenType {

get;

set;

}

public bool Quoted {

get;

set;

}

public int FormulaIndex {

get;

set;

}

public Token(String tokenText, TokenType tokenType, bool quoted, int formulaIndex) {

TokenText = tokenText;

TokenType = tokenType;

Quoted = quoted;

FormulaIndex = formulaIndex;

}

[Serializable()]

public class ParseException: System.Exception {

public ParseException() : base() {}

public ParseException(string message) : base(message) {}

public ParseException(string message, System.Exception inner) : base(message, inner) {}

protected ParseException(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) {}

}

private static char cDecimalSeparator = '.';

private static char cListSeparator = ',';

private static char cArrayRowSeparator = ';';

private static char cArrayColumnSeparator = ',';

private static char cLeftBrace = '{';

private static char cRightBrace = '}';

private static char cLeftBracket = '[';

private static char cRightBracket = ']';

private static char cLeftRoundBracket = '(';

private static char cRightRoundBracket = ')';

private static char cDoubleQuote = '\"';

private static String strBooleanTrue = "TRUE";

private static String strBooleanFalse = "FALSE";

private static String strErrorRef = "#REF!";

private static String strErrorDiv0 = "#DIV/0!";

private static String strErrorNA = "#N/A";

private static String strErrorName = "#NAME?";

private static String strErrorNull = "#NULL!";

private static String strErrorNum = "#NUM!";

private static String strErrorValue = "#VALUE!";

private static String strErrorGettingData = "#GETTING_DATA";

private static String strLNSeparators = "+-*/^%=<>&:" + cRightRoundBracket + cRightBrace + cListSeparator;

private static String strArithmeticOps = "+-*/^";

private static String strComparisonOps = "=<>";

private static String strsBoolean = "\t" + strBooleanTrue + "\t" + strBooleanFalse + "\t";

private static String strsError = "\t" + strErrorRef + "\t" + strErrorDiv0 + "\t" + strErrorNA + "\t" + strErrorName + "\t" + strErrorNull + "\t" + strErrorNum + "\t" + strErrorValue + "\t" + strErrorGettingData + "\t";

public static List testStrs = new List() {

"=1\n\n \n 5",

"=(A1 ,A5)",

"=\"123\"\"456\"\"789\"",

"=3.1E-24-2.1E-24",

"=[MyBook.xlsx]Sheet1!$A$1",

"=Func(,\"abc\",,)",

"=IF(P5=1.0,\"NA\",IF(P5=2.0,\"A\",IF(P5=3.0,\"B\",IF(P5=4.0,\"C\",IF(P5=5.0,\"D\",IF(P5=6.0,\"E\",IF(P5=7.0,\"F\",IF(P5=8.0,\"G\"))))))))",

"=SUM((A:A A1:B1))",

"='MyCOM.AddIn.Function_Name'(,\"abc\",,)",

"=IF(\"a\"={\"a\",\"b\";\"c\",#N/A;-1,TRUE}, \"yes\", \"no\") & \" more \"\"test\"\" text\"",

"=+ AName- (-+-+-2^6) = {\"A\",\"B\"} + @SUM(R1C1) + (@ERROR.TYPE(#VALUE!) = 2)",

"=SUM('Sheet-1'!A1:'Sheet-1'!A10)",

"=Table1[#Headers]",

"=Table1[[a]:[b]]",

"=Table1[ [#Headers], [#Data], [a]:[b] ]",

"=IF(((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24< =40,((IF(D4<>\"\",((D5-D4)-0.020833333333),0))+(IF(D10<>\"\",((D11-D10)-0.020833333333),0))+(IF(D16<>\"\",((D17-D16)-0.020833333333),0))+(IF(D22<>\"\",((D23-D22)-0.020833333333),0))+(IF(D28<>\"\",((D29-D28)-0.020833333333),0))+(IF(D34<>\"\",((D35-D34)-0.020833333333),0))+(IF(D40<>\"\",((D41-D40)-0.020833333333),0)))*24,40)",

"=Table1[[#All],[b'[]]",

"=IF(L40>65,AQ26,ROUND((L12*IF(OR(AND(L40>=55, L41>=20),AND(L41>=20,$C$11=\"YES\")),L45,L44))+(L15 *IF(OR(AND(L40>=55,L41>=20),AND(L41>=20,$C$11=\"YES\")), L46,L44)),0))"

};

public static void RunTests() {

foreach(String formulaStr in testStrs) {

List tokens = ParseFormula(formulaStr);

if (tokens != null) {

String reformedStr = ReformFormula(tokens);

if (!reformedStr.Equals(formulaStr)) {

throw new ParseException("Failed to parse and reform \"" + formulaStr + "\"");

}

public static String ReformFormula(List tokens) {

String formulaStr = "=";

foreach(Token token in tokens) {

if (token.TokenType == TokenType.AtFunctionOpen) formulaStr += "@";

if (token.TokenType == TokenType.Text) {

formulaStr += cDoubleQuote;

foreach(Char c in token.TokenText) {

if (c == cDoubleQuote) formulaStr += cDoubleQuote;

formulaStr += c;

}

formulaStr += cDoubleQuote;

}

else {

if (token.Quoted) {

if (token.TokenType == TokenType.TableColumn || token.TokenType == TokenType.TableSection) {

formulaStr += "[" + token.TokenText + "]";

}

else {

formulaStr += ''';

foreach (Char c in token.TokenText)

{

if (c == ''')

formulaStr += ''';

formulaStr += c;

}

formulaStr += ''';

}

else formulaStr += token.TokenText;

if (token.TokenType == TokenType.FunctionOpen || token.TokenType == TokenType.AtFunctionOpen) formulaStr += cLeftRoundBracket;

else if (token.TokenType == TokenType.TableOpen) formulaStr += cLeftBracket;

}

return formulaStr;

}

public static List ParseFormula(String strFormula) {

int lngFormulaLen = strFormula.Length;

if (lngFormulaLen < =1) {

throw new ParseException("Formula too short");

}

if (!strFormula.StartsWith("=")) {

throw new ParseException("Formula doesn't start with '='");

}

List tokens = new List();

Stack tokenStack = new Stack();

ParsingState parseState = ParsingState.Expression1;

ParsingState parseReturnState = parseState;

String currentTokenStr = "";

int tokenIndex = 0;

bool continueParse = true;

int iChar = 1;

bool inQuote = false;

while (continueParse) {

char c = (iChar < lngFormulaLen) ? strFormula[iChar] : Char.MinValue;

switch (parseState) {

case ParsingState.Expression1:

{

if (Char.IsWhiteSpace(c)) {

parseReturnState = parseState;

parseState = ParsingState.WhiteSpace;

tokenIndex = iChar;

} else if (c == cDoubleQuote) {

parseReturnState = ParsingState.Expression2;

parseState = ParsingState.Text1;

tokenIndex = iChar;

iChar++;

} else if (Char.IsDigit(c)) {

parseReturnState = ParsingState.Expression2;

parseState = ParsingState.Number1;

tokenIndex = iChar;

} else if (c == '-') {

currentTokenStr = Char.ToString(c);

parseReturnState = ParsingState.Expression2;

parseState = ParsingState.MinusSign;

tokenIndex = iChar;

iChar++;

} else if (c == '+') {

parseState = ParsingState.PrefixOperator;

tokenIndex = iChar;

} else if (c == cLeftRoundBracket) {

parseState = ParsingState.SubExpression;

} else if (c == cLeftBrace) {

parseState = ParsingState.ArrayOpen;

} else if (c == ''') { parseState = ParsingState.LeadingName2; inQuote = true; tokenIndex = iChar; iChar++; } else if (c == '#') { parseReturnState = ParsingState.Expression2; parseState = ParsingState.ErrorX; tokenIndex = iChar; } else { parseState = ParsingState.LeadingName1; tokenIndex = iChar; } } break; // -------------------- -------------------- -------------------- case ParsingState.Expression2: { if (c == Char.MinValue) { continueParse = false; } else if (Char.IsWhiteSpace(c)) { parseReturnState = parseState; parseState = ParsingState.WhiteSpace; tokenIndex = iChar; } else if (c == cRightRoundBracket) { parseState = ParsingState.BracketClose; } else if (strArithmeticOps.Contains(c)) { parseState = ParsingState.ArithmeticOperator; tokenIndex = iChar; } else if (strComparisonOps.Contains(c)) { parseState = ParsingState.ComparisonOperator1; tokenIndex = iChar; } else if (c == ' & ') { parseState = ParsingState.TextOperator; tokenIndex = iChar; } else if (c == ' % ') { parseState = ParsingState.PostfixOperator; tokenIndex = iChar; } else if (c == ': ') { parseState = ParsingState.RangeOperator; tokenIndex = iChar; } else if (c == cListSeparator) { parseState = ParsingState.ListSeparator; tokenIndex = iChar; } else { // Check if whitespace is actually union operator Token objToken = tokens[tokens.Count - 1]; bool isError = true; if (objToken.TokenType == TokenType.WhiteSpace) { int offset = objToken.TokenText.IndexOf(''); if (offset >= 0)

{

currentTokenStr = objToken.TokenText.Substring(offset + 1);

if (offset == 0)

{

objToken.TokenType = TokenType.RangeOperator;

objToken.TokenText = " ";

}

else

{

objToken.TokenText = objToken.TokenText.Substring(0, offset);

tokens.Add(new Token(" ", TokenType.RangeOperator, false, objToken.FormulaIndex + offset - 1));

}

if (currentTokenStr.Length > 0)

{

tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, objToken.FormulaIndex + offset));

currentTokenStr = "";

inQuote = false;

}

parseState = ParsingState.Expression1;

isError = false;

}

else if (objToken.TokenType == TokenType.ErrorText && objToken.TokenText.Equals(strErrorRef))

{

parseState = ParsingState.Expression1;

isError = false;

}

if (isError)

{

throw new ParseException("Expected Operator but got " + c + " at position " + iChar);

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayConstant1:

{

if (c == cDoubleQuote)

{

parseReturnState = ParsingState.ArrayConstant2;

parseState = ParsingState.Text1;

tokenIndex = iChar;

iChar++;

}

else if (Char.IsDigit(c))

{

parseReturnState = ParsingState.ArrayConstant2;

parseState = ParsingState.Number1;

tokenIndex = iChar;

}

else if (c == ' - ')

{

currentTokenStr = Char.ToString(c);

parseReturnState = ParsingState.ArrayConstant2;

parseState = ParsingState.Number1;

tokenIndex = iChar;

iChar++;

}

else if (c == '#')

{

parseReturnState = ParsingState.ArrayConstant2;

parseState = ParsingState.ErrorX;

tokenIndex = iChar;

}

else

{

parseReturnState = ParsingState.ArrayConstant2;

parseState = ParsingState.Bool;

tokenIndex = iChar;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayConstant2:

{

if (c == cRightBrace)

{

parseState = ParsingState.ArrayClose;

}

else if (c == cArrayRowSeparator)

{

parseState = ParsingState.ArrayRowSeparator;

tokenIndex = iChar;

}

else if (c == cArrayColumnSeparator)

{

parseState = ParsingState.ArrayColumnSeparator;

tokenIndex = iChar;

}

else

{

throw new ParseException("Expected " + cRightBrace + " " + cArrayRowSeparator + " " + ParsingState.ArrayColumnSeparator + " but got " + c + " at position " + iChar);

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ArrayOpen:

{

tokenStack.Push(new Token(Char.ToString(c), TokenType.ArrayOpen, false, iChar));

tokens.Add(tokenStack.Peek());

parseState = ParsingState.ArrayConstant1;

iChar++;

}

break;

case ParsingState.ArrayClose:

{

Token prevToken = tokenStack.Pop();

if (prevToken.TokenType == TokenType.ArrayOpen)

{

tokens.Add(new Token(Char.ToString(c), TokenType.ArrayClose, false, iChar));

parseState = ParsingState.Expression2;

iChar++;

}

else

{

throw new ParseException("Encountered " + cRightBrace + " without matching " + cLeftBrace + " at position " + iChar);

}

break;

case ParsingState.ArrayColumnSeparator:

{

iChar++;

tokens.Add(new Token(Char.ToString(c), TokenType.ArrayColumnSeparator, false, tokenIndex));

parseState = ParsingState.ArrayConstant1;

}

break;

case ParsingState.ArrayRowSeparator:

{

iChar++;

tokens.Add(new Token(Char.ToString(c), TokenType.ArrayRowSeparator, false, tokenIndex));

parseState = ParsingState.ArrayConstant1;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.Bool:

{

currentTokenStr += c;

iChar++;

if (strsBoolean.IndexOf("\t" + currentTokenStr) >= 0)

{

if (strsBoolean.IndexOf("\t" + currentTokenStr + "\t") >= 0)

{

tokens.Add(new Token(currentTokenStr, TokenType.Bool, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = parseReturnState;

}

else

{

throw new ParseException("Expected Boolean Constant at position " + tokenIndex);

}

break;

// -------------------- -------------------- --------------------

case ParsingState.WhiteSpace:

{

if (Char.IsWhiteSpace(c))

{

currentTokenStr += c;

iChar++;

}

else

{

tokens.Add(new Token(currentTokenStr, TokenType.WhiteSpace, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = parseReturnState;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.SubExpression:

{

tokenStack.Push(new Token(Char.ToString(c), TokenType.SubExpressionOpen, inQuote, iChar));

tokens.Add(tokenStack.Peek());

parseState = ParsingState.Expression1;

iChar++;

}

break;

case ParsingState.BracketClose:

{

Token prevToken = tokenStack.Pop();

if (prevToken.TokenType == TokenType.FunctionOpen || prevToken.TokenType == TokenType.AtFunctionOpen)

{

tokens.Add(new Token(Char.ToString(c), TokenType.FunctionClose, false, iChar));

parseState = ParsingState.Expression2;

}

else if (prevToken.TokenType == TokenType.SubExpressionOpen)

{

tokens.Add(new Token(Char.ToString(c), TokenType.SubExpressionClose, false, iChar));

parseState = ParsingState.Expression2;

}

else

{

throw new ParseException("Encountered " + cRightRoundBracket + " without matching " + cLeftRoundBracket + " at position " + iChar);

}

iChar++;

}

break;

// -------------------- -------------------- --------------------

// Decide: Leading Name for Function, Table, R1C1Reference, Reference Qualifier or Cell Reference

case ParsingState.LeadingName1:

{

if (currentTokenStr.Length > 0 && c == cLeftRoundBracket)

{

parseState = ParsingState.FunctionX;

}

else if (currentTokenStr.Length > 0 && c == cLeftBracket)

{

parseState = ParsingState.SquareBracketOpen;

}

else if (currentTokenStr.Length > 0 && c == ' ! ')

{

parseState = ParsingState.ReferenceQualifier;

}

else if (c == Char.MinValue || Char.IsWhiteSpace(c) || strLNSeparators.Contains(c))

{

parseState = ParsingState.LeadingNameE;

}

else

{

currentTokenStr += c;

iChar++;

}

break;

case ParsingState.LeadingName2:

{

if (c == ''') {

parseState = ParsingState.LeadingName3;

}

else {

currentTokenStr += c;

}

iChar++;

}

break;

case ParsingState.LeadingName3:

{

if (c == ''')

{

parseState = ParsingState.LeadingName2;

currentTokenStr += c;

iChar++;

}

else

{

parseState = ParsingState.LeadingName1;

}

break;

case ParsingState.LeadingNameE:

{

if (currentTokenStr.Length > 0)

{

tokens.Add(new Token(currentTokenStr, (currentTokenStr.Equals(strBooleanTrue) || currentTokenStr.Equals(strBooleanFalse)) ? TokenType.Bool : TokenType.Reference, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

}

parseState = ParsingState.Expression2;

}

break;

// -------------------- -------------------- --------------------

// Function

case ParsingState.FunctionX:

{

if (currentTokenStr.StartsWith("@"))

{

tokenStack.Push(new Token(currentTokenStr.Substring(1), TokenType.AtFunctionOpen, inQuote, tokenIndex));

}

else

{

tokenStack.Push(new Token(currentTokenStr, TokenType.FunctionOpen, inQuote, tokenIndex));

}

tokens.Add(tokenStack.Peek());

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

iChar++;

}

break;

// -------------------- -------------------- --------------------

// Decide: Table or R1C1 Reference

case ParsingState.SquareBracketOpen:

{

if (currentTokenStr.Equals("R") || currentTokenStr.Equals("C"))

{

parseState = ParsingState.R1C1Reference;

}

else

{

tokenStack.Push(new Token(currentTokenStr, TokenType.TableOpen, inQuote, tokenIndex));

tokens.Add(tokenStack.Peek());

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Table1;

iChar++;

}

break;

// -------------------- -------------------- --------------------

// Table

case ParsingState.Table1:

{

tokenIndex = iChar;

if (Char.IsWhiteSpace(c))

{

parseReturnState = ParsingState.Table1;

parseState = ParsingState.WhiteSpace;

}

else if (c == cLeftBracket)

{

inQuote = true;

iChar++;

parseState = ParsingState.Table6;

}

else

{

parseState = ParsingState.Table5;

}

break;

case ParsingState.Table2:

{

tokenIndex = iChar;

if (Char.IsWhiteSpace(c))

{

parseReturnState = ParsingState.Table2;

parseState = ParsingState.WhiteSpace;

}

else if (c == cListSeparator)

{

parseState = ParsingState.Table3;

}

else if (c == ': ')

{

parseState = ParsingState.Table4;

}

else if (c == cRightBracket)

{

parseState = ParsingState.TableE;

}

else

{

parseState = ParsingState.Table5;

}

break;

case ParsingState.Table3:

{

iChar++;

tokens.Add(new Token(Char.ToString(c), TokenType.TableItemSeparator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Table1;

}

break;

case ParsingState.Table4:

{

iChar++;

tokens.Add(new Token(Char.ToString(c), TokenType.TableColumnSeparator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Table1;

}

break;

case ParsingState.Table5:

{

if (c == cRightBracket)

{

tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection : TokenType.TableColumn, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.TableE;

}

else if (c == ''') {

iChar++;

currentTokenStr += c;

parseReturnState = ParsingState.Table5;

parseState = ParsingState.TableQ;

}

else {

currentTokenStr += c;

iChar++;

}

break;

case ParsingState.Table6:

{

if (c == cRightBracket) {

iChar++;

tokens.Add(new Token(currentTokenStr, (currentTokenStr.StartsWith("#")) ? TokenType.TableSection: TokenType.TableColumn, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Table2;

}

else if (c == ''')

{

iChar++;

currentTokenStr += c;

parseReturnState = ParsingState.Table6;

parseState = ParsingState.TableQ;

}

else

{

currentTokenStr += c;

iChar++;

}

break;

case ParsingState.TableQ:

{

currentTokenStr += c;

iChar++;

parseState = parseReturnState;

}

break;

case ParsingState.TableE:

{

Token prevToken = tokenStack.Pop();

if (prevToken.TokenType == TokenType.TableOpen)

{

tokens.Add(new Token(Char.ToString(c), TokenType.TableClose, false, iChar));

parseState = ParsingState.Expression2;

iChar++;

}

else

{

throw new ParseException("Encountered " + cRightBracket + " without matching " + cLeftBracket + " at position " + iChar);

}

break;

// -------------------- -------------------- --------------------

// Text

case ParsingState.Text1:

{

if (c == cDoubleQuote)

{

parseState = ParsingState.Text2;

}

else

{

currentTokenStr += c;

}

iChar++;

}

break;

case ParsingState.Text2:

{

if (c == cDoubleQuote)

{

parseState = ParsingState.Text1;

currentTokenStr += c;

iChar++;

}

else

{

tokens.Add(new Token(currentTokenStr, TokenType.Text, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = parseReturnState;

}

break;

// -------------------- -------------------- --------------------

// Number

case ParsingState.Number1:

case ParsingState.Number2:

{

if (Char.IsDigit(c))

{

currentTokenStr += c;

iChar++;

}

else if (c == cDecimalSeparator && parseState == ParsingState.Number1)

{

currentTokenStr += c;

parseState = ParsingState.Number2;

iChar++;

}

else if (c == '

E ')

{

currentTokenStr += c;

parseState = ParsingState.Number3;

iChar++;

}

else

{

parseState = ParsingState.NumberE;

}

break;

case ParsingState.Number3:

{

if (c == ' + ' || c == ' - ')

{

currentTokenStr += c;

parseState = ParsingState.Number4;

iChar++;

}

else

{

throw new ParseException("Expected + or - at position " + iChar);

}

break;

case ParsingState.Number4:

{

if (Char.IsDigit(c))

{

currentTokenStr += c;

iChar++;

}

else

{

parseState = ParsingState.NumberE;

}

break;

case ParsingState.NumberE:

{

tokens.Add(new Token(currentTokenStr, TokenType.Number, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = parseReturnState;

}

break;

// -------------------- -------------------- --------------------

// Error

case ParsingState.ErrorX:

{

currentTokenStr += c;

iChar++;

if (strsError.IndexOf("\t" + currentTokenStr) >= 0)

{

if (strsError.IndexOf("\t" + currentTokenStr + "\t") >= 0)

{

tokens.Add(new Token(currentTokenStr, TokenType.ErrorText, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = parseReturnState;

}

else

{

throw new ParseException("Expected Error Constant at position " + tokenIndex);

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ReferenceQualifier:

{

tokens.Add(new Token(currentTokenStr, TokenType.ReferenceQualifier, inQuote, tokenIndex));

tokens.Add(new Token(Char.ToString(c), TokenType.ExternalReferenceOperator, false, iChar));

iChar++;

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

case ParsingState.MinusSign:

{

if (Char.IsDigit(c))

{

parseState = ParsingState.Number1;

}

else

{

parseState = ParsingState.PrefixOperator;

iChar--;

}

break;

case ParsingState.PrefixOperator:

{

iChar++;

tokens.Add(new Token(Char.ToString(c), TokenType.UnaryOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

// -------------------- -------------------- --------------------

// ListSeparator can be either function parameter separator, or a union operator

case ParsingState.ListSeparator:

{

bool isParam = false;

if (tokenStack.Count() > 0)

{

if (tokenStack.Peek().TokenType == TokenType.FunctionOpen || tokenStack.Peek().TokenType == TokenType.AtFunctionOpen)

{

parseState = ParsingState.ParameterSeparator;

isParam = true;

}

if (!isParam)

{

parseState = ParsingState.RangeOperator;

}

break;

case ParsingState.RangeOperator:

{

currentTokenStr = Char.ToString(c);

iChar++;

tokens.Add(new Token(currentTokenStr, TokenType.RangeOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ArithmeticOperator:

{

currentTokenStr = Char.ToString(c);

iChar++;

tokens.Add(new Token(currentTokenStr, TokenType.ArithmeticOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ComparisonOperator1:

{

currentTokenStr = Char.ToString(c);

iChar++;

parseState = (c == ' < ' || c == ' > ') ? ParsingState.ComparisonOperator2 : ParsingState.ComparisonOperatorE;

}

break;

case ParsingState.ComparisonOperator2:

{

if (c == ' = ' || (currentTokenStr.Equals("< ") && c == ' > '))

{

currentTokenStr += c;

iChar++;

}

parseState = ParsingState.ComparisonOperatorE;

}

break;

case ParsingState.ComparisonOperatorE:

{

tokens.Add(new Token(currentTokenStr, TokenType.ComparisonOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.TextOperator:

{

currentTokenStr = Char.ToString(c);

iChar++;

tokens.Add(new Token(currentTokenStr, TokenType.TextOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.PostfixOperator:

{

currentTokenStr = Char.ToString(c);

iChar++;

tokens.Add(new Token(currentTokenStr, TokenType.PostfixOperator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression2;

}

break;

// -------------------- -------------------- --------------------

case ParsingState.ParameterSeparator:

{

currentTokenStr = Char.ToString(c);

iChar++;

tokens.Add(new Token(currentTokenStr, TokenType.ParameterSeparator, inQuote, tokenIndex));

currentTokenStr = "";

inQuote = false;

parseState = ParsingState.Expression1;

}

break;

default:

break;

}

// todo: error if token stack not empty

// post processing

// 1. Scan and detect !, and join if the 3 tokens prior are CellReference-Colon-CellReference

// 2. Scan and detect CellReference-Colon-CellReference, because they should be joined into an AreaReference

return tokens;

}

Peter says:

December 20, 2020 at 6:37 am

@Dick Kusleika: is this code working for you? I cannot seem to get it to work here. Could you maybe doublecheck or share it in another fileformat?

Can anybody share the original files? This link is dead :http://vangelder.orcon.net.nz/files/ExcelFormulas.zip

Thanks alot!
Dick Kusleika says:

December 21, 2020 at 7:42 am

I don’t have a copy of the ExcelFormatter file to see if it works. I have the original AudXL file that I downloaded who knows when. I’m sure it isn’t the latest version, but since Rob shut down his website we get what we get. You might try archive.org, but I haven’t had much luck there. Here’s AudXL if it’s of any interest.

http://dailydoseofexcel.com/excel/AudXL.zip