Pattern Matching

I’ve just recently had a task where I had to do table lookups by a unique Code.
It was a pretty special sort of Code though. The first character has special meaning and the last two characters have special meaning.

I had the option of creating a complicated mapping table, but then I recalled my unix days… Regular Expressions!
Regular Expressions are great. They provide syntax for advanced string pattern matching.

In this example I want to find the Code which starts with a digit and ends with the letter J followed by the number 9.
The text in cell A3 is the pattern which matches what I’m looking for. In this case, that pattern matches the Code 9SGJ9.
I could have used ^(S|8)YRUP$ to match 8YRUP (or SYRUP if it were in the list of Codes).
The pattern matching syntax is sophisticated. The documentation is really worth a look.

The formula in cell B3 is:
=INDEX(B6:B15, RegMatch($A$6:$A$15, $A$3))

RegMatch is a User Defined Function in VBA for handling the pattern matching.
Regular Expression functions are already provided by Microsoft’s VBScript library so I just wrote a wrapper around it and called that RegMatch.

You’ll need to reference “Microsoft VBScript Regular Expressions”.
On my machine it’s version 5.5, but a later version is available for download from Microsoft’s website.

Dim reg As New RegExp
Public Function RegMatch(Source As Range, Pattern As String, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Long
    Dim rng As Range, i As Long, j As Long
    reg.IgnoreCase = IgnoreCase
    reg.MultiLine = MultiLine
    reg.Pattern = Pattern
    i = 0: j = 0
    For Each rng In Source
        i = i + 1
        If reg.test(rng.Value) Then
            j = i
            Exit For
        End If
    RegMatch = j
End Function

As I’m writing this post I’ve been looking on the Internet for other examples.
A great newsgroup post by Harlan Grove offers similar wrappers around the scripting library.

The documentation for VBScript Regular Expressions is available from Microsoft’s website. Personally, I prefer the offline version.

Posted in Uncategorized

46 thoughts on “Pattern Matching

  1. Very nice, Rob. I didn’t know about that. I’ve tried to do similar things using VBA’s Like operator, but this is definitely a better choice.

  2. Hi Rob,

    I’ve been mucking about with RegExp for a while and found it useful for a variety of Excel tasks from rearranging peoples names, replacing specific absolute cell references with relative references, to extacting the sheet names from a multi sheet RefEdit box range.

    As I had a number of requests for some examples, I pulled together a small file showing different RegExp methods (Test, Execute & Replace) and uses for RegExp with Excel

    Hopefully this link at VBAX is accessible

    there is a discussion thread on this example @
    but I think you need to be a VBAX member to see it



  3. LOL :)

    Dennis, that’s the first time I’ve been browsing through here for a month or more, and we end up posting a minute apart.

    Thanks for the recommendation.

    I hope all is well with you and your family. All is well here with our newborn but I’m travelling a lot for work at the momement



  4. Hi.

    Some great links here – thanks.

    I too have a newborn – well, 7 months… some of the reason I’ve not been posting recently ;)

    In the above example I used it for Code lookups.

    My other use for RegExp in this particular task was for massaging the output of DOS program.
    The DOS program creates a fixed column width report, divided into sections – with Headers, Footers, Subnotes etc…
    I used Regular expressions to detect and split sections. Then detected headers and footers for removal. Removed lines with a blank code (subnotes).
    Removed empty lines.
    I was finally left with only data in fixed width column ready for transfer.
    In all, about 10 lines of code (each calling a generic RegReplace function)

    I dread to think what the non-RegEx equivalent would have been.

  5. “I dread to think what the non-RegEx equivalent would have been.”

    I wrote the non-RegEx version of this a few years ago. I was parsing mainframe output in an antique QA application, for which we fortunately had a terminal emulator with a VBA object model (the in-house VB programmer couldn’t figure out how to work it, and wouldn’t listen to a mere engineer–me–explain it to him). I’d get the output, then line by line search for recognized phrases, then go to the branch of the code that parsed that particular line or section. It would break every second time I ran it, because it would encounter yet another different abbreviation in the output. RegEx would have helped immensely.

  6. This is a great bit of code for RegExp functions that actually work!!
    Wish i’d seen this a long time ago when i first started my involvment with excel (after a long time working on unix).

  7. Here is an example from Microsoft’s site

    Function TestRegExp(myPattern As String, _
                        myString As String) As String
    Dim objRegExp           As RegExp
    Dim objMatch            As Match
    Dim colMatches          As MatchCollection
    Dim RetStr              As String
    ‘   Create a regular expression object.
       Set objRegExp = New RegExp

        With objRegExp
        ‘   Set the pattern by using the Pattern property.
           .Pattern = myPattern
        ‘   Set Case Insensitivity.
           .IgnoreCase = True
        ‘   Set global applicability.
           objRegExp.Global = True
        ‘   Test whether the String can be compared.
           If (.Test(myString) = True) Then
            ‘   Get the matches.
               Set colMatches = .Execute(myString)     ‘   Execute search.
                For Each objMatch In colMatches         ‘   Iterate Matches collection.
                   With objMatch
                        RetStr = RetStr & “Match found at position “
                        RetStr = RetStr & .FirstIndex & “. Match Value is ‘”
                        RetStr = RetStr & .Value & “‘.” & vbCrLf
                    End With
                 RetStr = “String Matching Failed”
            End If
        End With
       TestRegExp = RetStr
    End Function

  8. That did’nt work well. I think you can read it anyway. I can never remember what the tags are for code on this site. ‘[]’ ?

  9. Another solution is to use iserror(match(“*” & yourSearchCriteria & “*”,yourCell,0)) as a formula.

    If it returns true your criteria is not found.

  10. Hi everyone

    Can someone give me some more pointers on how to make the “RegExp” Object available to me in Excel?

    I went to the Microsoft website, and installed “Windows Script 5.7?. Is that the right thing? If it is, what else do I have to do to make my VBA macros recognize this object? I tried to initialize both “VBScript_RegExp_57.RegExp” and just “RegExp”, but it does not recognize either.

    I’m not at all familiar with VBScript so I might be completely on the wrong track. But browsing the Microsoft website didn’t help me much…

    Thanks in advance.

  11. All:

    I have following question on the parsing and using regular expressions.

    The following information need to be parsed and provide the results as follows:

    123 Technology, Inc.
    Access Info, LLC
    First systems, llc
    Level 5, Ltd
    Level 3, LLC

    Expected Output:
    123 Technology
    Access Info
    First systems
    Level 5
    Level 3

    How do I get the results in Excel using regexp or existing formulas.

    Thank You in advance.
    Ramesh V.

  12. Ramesh, the formula that you need is:


    (assuming the input text is at cell A1)

  13. As a first time poster but long time reader of DDoE, many thanks to all who post wonderful solutions and ideas.

    Here is a problem that I have been grappling with and hope someone can provide me with some ideas that will lead me towards a solution.

    I have a model that processes real-time data and outputs to matrix table with either a “1” or “0.” The matrix table consists of eight letters across the top (“A” to “H”) and ten rows down (1 to 10), i.e. a table with 80 cells. As the model runs, it populates these 80 cells with either a “1” or “0.” Now I concatenate all these 1’s and 0’s into a string that might look like something like “B7C3E8.” So as you can imagine this would imply a “1” in the “B” column, row 7, a “1” in the “C” column, row 3, and a “1” in the “E” column, row 8. The rest of the cells would have “0’s.”

    I have a library of patterns that I need to see if they are contained in the model output string. So for example, I may have a pattern called “A3A7B3E9” and need to see if this exists within the model output string which might look like “A2A3A7B1B3B6C1C8D7D8E2E9F1H3H5.” In this specific example, there is a pattern match because “A3”, “A7”, “B3”, and “E9” are contained within the model output string.

    I have been trying to use RegMatch function, but so far I have had limited success. I can get a match in some cases, but the example above will not be picked up by the function as a match. Is what I’m trying to do possible with the RegMatch function or is there a better approach?

    I should note that I am currently matching patterns by having two identical tables, one that is the model output and the other that is the desired pattern and then comparing the two as arrays. This actually works quite well, but as my library of search patterns has grown, I now need to loop through my library of patterns and identify the matches in a more streamlined fashion. Any help or ideas would be much appreciated.

    Best regards,

  14. @Bill Ghauri

    Are your patterns always ordered in the same way… all the A’s are listed first in numerical order, then the B’s are listed next in numerical order, followed by the C’s, D’s, etc. each numerical order (as your example shows)? If yes, and if the pattern is also listed in with this ordering, then you use this function which returns True if the pattern exists and False otherwise…

    Function IsPatternMatch(ModelOutputString As String, _
                          Pattern As String) As Variant
      Dim X As Long, WildcardPattern As String
      For X = 1 To Len(Pattern) Step 2
        WildcardPattern = WildcardPattern & “*” & Mid(Pattern, X, 2)
      IsPatternMatch = ModelOutputString Like WildcardPattern & “*”
    End Function

    And, using your posted pattern and model output string, you would use it like this…

    Sub Test()
      If IsPatternMatch(“A2A3A7B1B3B6C1C8D7D8E2E9F1H3H5”, “A3A7B3E9”) Then
        MsgBox “Yes, the pattern matches.”
        MsgBox “No, the pattern does not match.”
      End If
    End Sub
  15. Rick,

    Yes, the patterns are always ordered the same way… in alphabetical and numerical order. Your function works like a champ! It is exactly what I needed. Thank you so much.

    Best regards,
    Bill Ghauri

  16. @Bill Ghauri

    Sorry Bill, but there is a small problem with the function I posted… it will not work correctly if the Pattern contains any cells from the tenth row. Here is an amended function which should correct that problem…

    Function IsPatternMatch(ByVal ModelOutputString As String, _
                          ByVal Pattern As String) As Variant
      Dim X As Long, WildcardPattern As String
      Pattern = Replace(Pattern, “10”, “X”)
      ModelOutputString = Replace(ModelOutputString, “10”, “X”)
      For X = 1 To Len(Pattern) Step 2
        WildcardPattern = WildcardPattern & “*” & Mid(Pattern, X, 2)
      IsPatternMatch = ModelOutputString Like WildcardPattern & “*”
    End Function
  17. My first thought is that this would be a very good situation in which to use Longre’s MOREFUNC.XLL add-in. Doing so, this problem could be reduced to formulas like


    This could be supplemented with some initial validity checking on string and pattern.


    which returns -3 if both string and pattern are invalid, -2 if only pattern is invalid, -1 if only string is invalid, 0 if valid pattern isn’t found within string, and 1 if valid pattern is found within string.

    OTOH, if add-ins aren’t allowed but VBA is, more general is better than more particular. Rick’s revised UDF works given the specs, but if the specs change, e.g., range expands to row 11 or further and/or to column X or further, it fails. Myself, I’d use a state machine and have the udf perform validity checking. Something like

    Function matchpat(pat As String, str As String) As Long
      ‘first Const for case insensitive, second for case sensitive/only upper case valid
     Const LETTERS As String = “[A-Za-z]”
      ‘Const LETTERS As String = “[A-Z]”

      Dim k As Long, st As Long, ch As String * 1, apat As String

      For k = 1 To Len(pat)
        ch = Mid$(pat, k, 1)

        If st = 0 And ch Like LETTERS Then
          apat = apat & ch
        ElseIf st = 0 And ch Like “#” Then
          apat = apat & ch
          st = 1
        ElseIf st = 1 And ch Like “#” Then
          apat = apat & ch
        ElseIf st = 1 And ch Like LETTERS Then
          apat = apat & “*” & ch
          st = 0
          st = 0
          Exit For
        End If
      Next k

      matchpat = IIf(st = 1, 0, -2)
      st = 0

      For k = 1 To Len(str)
        ch = Mid$(str, k, 1)

        If st = 0 And ch Like LETTERS Then
          ‘nothing to do
       ElseIf st = 0 And ch Like “#” Then
          st = 1
        ElseIf st = 1 And ch Like “#” Then
          ‘nothing to do
       ElseIf st = 1 And ch Like LETTERS Then
          st = 0
          st = 0
          Exit For
        End If

      Next k

      matchpat = matchpat + IIf(st = 1, 0, -1)

      If matchpat = 0 Then matchpat = Abs(str Like “*” & apat & “*”)

    End Function

    Quibble: argument ordering should be comparable to Excel’s built-in functions, e.g.,






    rather than


  18. @fzz and Bill Ghauri

    Responding to the comments made by fzz, this is how I would write the general case pattern search modeled after Bill’s setup; it should work for any address range in XL2003 or XL2007/2010 and beyond; that is, as long as the ordering is as Bill has defined it, it should work for any number of letters followed by any number of digits for each “address” in the pattern and model output strings.

    Function IsPatternMatch(ByVal ModelOutput As String, _
                            ByVal Pattern As String) As Boolean
      Dim X As Long
        X = X + 1
        If X <= Len(Pattern) Then
          If Mid(Pattern, X, 2) Like “#[a-zA-Z]” Then
            Pattern = Replace(Pattern, Mid(Pattern, X, 2), Mid(Pattern, _
                        X, 1) & “-“ & “*” & “-“ & Mid(Pattern, X + 1, 1))
          End If
        End If
        If X <= Len(ModelOutput) Then
          If Mid(ModelOutput, X, 2) Like “#[a-zA-Z]” Then
            ModelOutput = Replace(ModelOutput, Mid(ModelOutput, X, 2), _
                                  Mid(ModelOutput, X, 1) & “-*-“ & _
                                  Mid(ModelOutput, X + 1, 1))
          End If
          Pattern = “*-“ & Pattern & “-*”
          ModelOutput = “*-“ & ModelOutput & “-*”
          Exit Do
        End If
      IsPatternMatch = ModelOutput Like Pattern
    End Function
  19. @fzz

    Your function does not work quite right. It will return True for this function call…

    matchpat(“A3A7B1234E10?, “A2A3A79B1234B3B6C1C8D7D8E10E9F123H3H5AB1234CC1ABC123?)

    Using your argument names, there is an A7 in the ‘pat’ argument, but no A7 in the ‘str’ argument (it looks like is matched the first two characters from the A79 address).

  20. Rick and fzz,

    Thank you both for the solutions. Rick, in my initial testing, I did not notice the 10th row limitation at first, but I did later. Your third version of the function works very well. Since my actual model currently contains 80 rows (but it could easily be over 100 soon), I like the flexibility of your function to accommodate an expanded table range. Btw, to get the model output string in the proper order, I have a function that concatenates the columns first and then concatenates the column “totals” (i.e. top to bottom then left to right). This ensures the model output string is always in the proper order. Again, many thanks.

    Best regards,
    Bill Ghauri

  21. @Rick- then better still to use MOREFUNC.XLL regular expressions.

    =REGEX.COUNT(REGEX.SUBSTITUTE(string,”(d)([A-Z])”,”[1] [2]”),

    As for udfs without regular expressions, you’re right: delimiters between tokens are needed, but why

    … & “-” & “*” & “-” & …

    rather than

    … & “-*-” & …

    in the first Replace call?

  22. @fzz…

    ==> then better still to use MOREFUNC.XLL regular expressions.

    I still like rolling my own code… and the function I came up with is relatively short… plus, in my opinion, will be easier to read next week than the regular expression pattern will be. [grin]

    ==> but why … & “-” & “*” & “-” & … rather than
    ==> … & “-*-” & … in the first Replace call?

    I originally had Chr$(1) instead of the dashes; then I did a Replace on the Chr$(1) and replaced them with “-“; then I noticed I could replace the “-“&”*”&”-” with “-*-“, which I did by hand… and, of course, I missed one of them.

  23. Also, someone recently posted that MOREFUNC can fail in Excel 2007. Haven’t tried it myself, have never used MOREFUNC, but if that is correct it validates my decision.

  24. I’m running into problems where Excel complains about “RegExp” being an undefined user data type.

    Not sure what is going wrong, but I’m running Excel 2003 on Windows 7 64-bit, which is probably not a common setup. :(

  25. Thank you for a very useful example. Although I see it was written in 2005, I still hope somebody can answer my question.

    If I understand this code correctly, as soon as it find a match it will exit the loop. But what if there were several matches, and I want them all summed up. How should the loop be written then?

    Example, I want all instances of New York and the number to be summed up:

    New York 5
    New Year 6
    New York 10

    The result should be 15. The code example in this article will give me the number 5 and exit the loop. Correct?

    Thanks in advance.

  26. A quick helper function. You can adjust to fit:

    Public Function RegSum(Source As Range, Pattern As String, SumRange As Range, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Double
        Dim rng As Range, i As Long, j As Long, dbl As Double

        reg.IgnoreCase = IgnoreCase
        reg.MultiLine = MultiLine
        reg.Pattern = Pattern

        dbl = 0
        For i = 1 To Source.Rows.Count
            For j = 1 To Source.Columns.Count
                If reg.test(Source(i, j).Value) Then dbl = dbl + SumRange(i, j).Value
        RegSum = dbl
    End Function

    A3 = ^New York$
    A6:A8 = New York, New Year, New York
    B6:B8 = 5, 6, 10

    Function call
    =RegSum(A6:A8, A3, B6:B8)

  27. Thank you very much, this was really helpful.

    The code is looping through a list of URLs that are generated from Google Analytics using arrays.
    The list is ordered from highest to lowest pageview number. Since I don’t know how many URLs the list will contain, some of the cells will show #N/A instead of he URL and pageview number.

    URL Pageview
    /url-1 10
    /url-2 7
    #N/A #N/A

    This made the code not working, so I added some On Error Resume Next to fix the problem. The code is below, and it is working, but if you see anything that should be different, please let me know.

    Dim reg As New RegExp

    Public Function RegSum(Source As Range, Pattern As String, SumRange As Range, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Double
        Dim rng As Range, i As Long, j As Long, dbl As Double

        With reg
            .IgnoreCase = IgnoreCase
            .MultiLine = MultiLine
            .Pattern = Pattern
        End With

        On Error Resume Next
        dbl = 0
        For i = 1 To Source.Rows.Count
            If Err.Number = 0 Then
                For j = 1 To Source.Columns.Count
                    If reg.Test(Source(i, j).Value) Then dbl = dbl + SumRange(i, j).Value
            End If
        RegSum = dbl
    End Function

    Again, thank you very much for your help.

  28. OY!!

    I’m having a really hard time doing a simple thing…. I just need to check whether the length of a number in Cell A is less or equal to 13 digits and then have Cell B’s value be one of 2 things …
    PS. I haven’t used any excel since school

  29. Sir,

    Private Sub Form_Load()
    Dim dbMasterRecords As Database
    Dim rsmas_rec As Recordset
    Set dbMasterRecords = OpenDatabase(“c:\Program Files\MasterRecords.mdb”)
    Data1.RecordSource = “select * from mas_rec”
    Label6.Caption = Data1.Recordset.RecordCount
    End Sub
    Private Sub CLOSE_Click()
    Unload FIND
    End Sub
    Private Sub FILENO_Click()
    Dim c As String
    c = InputBox(“ENTER FILE NO”, “FIND”)
    Data1.RecordSource = (“select * from mas_rec where rc_no='” + c + “‘”)
    If Data1.Recordset.RecordCount = 0 Then
    Label5.Caption = Str(Data1.Recordset.RecordCount)
    End If
    End Sub

    Private Sub SECTIONNAME_Click()
    Dim d As String
    d = InputBox(“ENTER SECTION NAME”, “FIND”)
    Data1.RecordSource = “select * from mas_rec where sec_name='” + d + “‘”
    If Data1.Recordset.RecordCount = 0 Then
    Label5.Caption = Str(Data1.Recordset.RecordCount)
    End If
    End Sub
    Private Sub PRINTVIEW_Click()
    Dim DBGrid(1) As DataReport2
    Dim FILENO_Command() As Boolean
    Set DBGrid(1) = DataReport2(“select * from mas_rec where rc_no”)
    If DBGrid(1) = True Then
    DBGrid(1) = DataReport2
    Unload FIND
    Show DataReport2
    End If
    End Sub

  30. Hi All,

    I first learnt about regular expressions when using them in Perl and checking for things like e-mail addresses were entered correctly – very powerful things I have to say.

    As for Excel, I was looking for a tool that would strip any illegal characters for the cells on my sheet so I used something like this:

    Dim RE As Object
    Dim rng As Range
    Set RE = CreateObject("VBScript.RegExp")
    With RE
    .Global = True
    ' Place the WANTED characters between [^ and ]
    .Pattern = "[^a-zA-Z0-9\s\.\\\/-]"
    For Each rng In ActiveSheet.UsedRange.SpecialCells(xlCellTypeConstants)
    rng.value = .Replace(rng.value, "")
    Next rng
    End With

    Set RE = Nothing

    Seems to do the trick for me :o)



  31. Hi All,
    I’m newbie to excel programming. Finding difficulty to achieve a pattern matching output as below.

    Input Output
    1234 ABCD
    1123 AABC
    1112 AAAB
    1111 AAAA
    1212 ABAB
    1221 ABBA
    How do I get this output in excel.

  32. Sub pmatch()

    Dim vaInput As Variant
    Dim i As Long
    Dim j As Long
    Dim sOutput As String

    vaInput = Array(1234, 1123, 1112, 1111, 1212, 1221)

    For i = LBound(vaInput) To UBound(vaInput)
    sOutput = ""
    For j = 1 To Len(vaInput(i))
    sOutput = sOutput & Chr$(64 + Mid(vaInput(i), j, 1))
    Next j
    Debug.Print sOutput
    Next i

    End Sub

  33. Dear Dick Kusleika,

    Thank you so much.

    The number’s not necessariy those I shown above. It be any number that falls in 0000 to 9999. So vaInput I need to keep all these starting from “0000”

    Kind Regards,

  34. Hi
    I am also new to regex and was wondering if the line starting with “if application.sheets(i)…” could be redone using regex. the sheets named “vacant”, Vacant_1″ etc could extend to “Vacant_10”

    how would I work with all sheets named Vacant_something and Audit. Hope I am clear in my explanation of what I need.

    Sub ListWorkSheetNames()
    Dim xWs As Worksheet
    Set xWs = Application.ActiveSheet
    For i = 1 To Application.Sheets.Count
    If Application.Sheets(i).Name = "Vacant" Or Application.Sheets(i).Name = "_" Or Application.Sheets(i).Name = "Audit" Or Application.Sheets(i).Name = "Vacant_1" Or Application.Sheets(i).Name = "Vacant_2" Then
        GoTo nextiteration
        xWs.Range("A" & i + 2) = Application.Sheets(i).Name
    End If
    Next i
    End Sub


  35. You don’t need regex for that.

    If Application.Sheets(i).Name Like "Vacant_*" Or Application.Sheets(i).Name = "Audit" Then

  36. Ta thank you very much. I have another question ….

    I need to find a cell address to input into a formula based on a date input via input box. The cell address will be in another worksheet. I dont know the cell address but i do know the data in that cell, I need the cell address

    any help would be good


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.