I’ve just recently had a task where I had to do table lookups by a unique Code.
It was a pretty special sort of Code though. The first character has special meaning and the last two characters have special meaning.
I had the option of creating a complicated mapping table, but then I recalled my unix days… Regular Expressions!
Regular Expressions are great. They provide syntax for advanced string pattern matching.
In this example I want to find the Code which starts with a digit and ends with the letter J followed by the number 9.
The text in cell A3 is the pattern which matches what I’m looking for. In this case, that pattern matches the Code 9SGJ9.
I could have used ^(S|8)YRUP$ to match 8YRUP (or SYRUP if it were in the list of Codes).
The pattern matching syntax is sophisticated. The documentation is really worth a look.
The formula in cell B3 is:
=INDEX(B6:B15, RegMatch($A$6:$A$15, $A$3))
RegMatch is a User Defined Function in VBA for handling the pattern matching.
Regular Expression functions are already provided by Microsoft’s VBScript library so I just wrote a wrapper around it and called that RegMatch.
You’ll need to reference “Microsoft VBScript Regular Expressions”.
On my machine it’s version 5.5, but a later version is available for download from Microsoft’s website.
Public Function RegMatch(Source As Range, Pattern As String, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Long
Dim rng As Range, i As Long, j As Long
reg.IgnoreCase = IgnoreCase
reg.MultiLine = MultiLine
reg.Pattern = Pattern
i = 0: j = 0
For Each rng In Source
i = i + 1
If reg.test(rng.Value) Then
j = i
Exit For
End If
Next
RegMatch = j
End Function
As I’m writing this post I’ve been looking on the Internet for other examples.
A great newsgroup post by Harlan Grove offers similar wrappers around the scripting library.
The documentation for VBScript Regular Expressions is available from Microsoft’s website. Personally, I prefer the offline version.
That’s cool, Rob. Here’s another link
http://www.tmehta.com/regexp/
which I can only assume is Tushar Mehta.
Very nice, Rob. I didn’t know about that. I’ve tried to do similar things using VBA’s Like operator, but this is definitely a better choice.
Hi Rob,
I’ve been mucking about with RegExp for a while and found it useful for a variety of Excel tasks from rearranging peoples names, replacing specific absolute cell references with relative references, to extacting the sheet names from a multi sheet RefEdit box range.
As I had a number of requests for some examples, I pulled together a small file showing different RegExp methods (Test, Execute & Replace) and uses for RegExp with Excel
Hopefully this link at VBAX is accessible
http://www.vbaexpress.com/kb/getarticle.php?kb_id=68
there is a discussion thread on this example @
http://www.vbaexpress.com/forum/showthread.php?t=226
but I think you need to be a VBAX member to see it
Cheers
Dave
I agree :)
Here is a link to a workbook that contain 7 examples for regular expression from my friend Dave (aka brettdj):
http://www.vbaexpress.com/kb/getarticle.php?kb_id=68
And if You want to test expression or learn more then this free tool – The Regulator – may be of interest:
http://regex.osherove.com/
Kind regards,
Dennis
LOL :)
Dennis, that’s the first time I’ve been browsing through here for a month or more, and we end up posting a minute apart.
Thanks for the recommendation.
I hope all is well with you and your family. All is well here with our newborn but I’m travelling a lot for work at the momement
Cheers
Dave
:)
I will always associate Regular Expression with You ;)
Kind regards,
Dennis
Hi.
Some great links here – thanks.
I too have a newborn – well, 7 months… some of the reason I’ve not been posting recently ;)
In the above example I used it for Code lookups.
My other use for RegExp in this particular task was for massaging the output of DOS program.
The DOS program creates a fixed column width report, divided into sections – with Headers, Footers, Subnotes etc…
I used Regular expressions to detect and split sections. Then detected headers and footers for removal. Removed lines with a blank code (subnotes).
Removed empty lines.
I was finally left with only data in fixed width column ready for transfer.
In all, about 10 lines of code (each calling a generic RegReplace function)
I dread to think what the non-RegEx equivalent would have been.
“I dread to think what the non-RegEx equivalent would have been.”
I wrote the non-RegEx version of this a few years ago. I was parsing mainframe output in an antique QA application, for which we fortunately had a terminal emulator with a VBA object model (the in-house VB programmer couldn’t figure out how to work it, and wouldn’t listen to a mere engineermeexplain it to him). I’d get the output, then line by line search for recognized phrases, then go to the branch of the code that parsed that particular line or section. It would break every second time I ran it, because it would encounter yet another different abbreviation in the output. RegEx would have helped immensely.
This is a great bit of code for RegExp functions that actually work!!
Wish i’d seen this a long time ago when i first started my involvment with excel (after a long time working on unix).
Here is an example from Microsoft’s site
myString As String) As String
Dim objRegExp As RegExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim RetStr As String
‘ Create a regular expression object.
Set objRegExp = New RegExp
With objRegExp
‘ Set the pattern by using the Pattern property.
.Pattern = myPattern
‘ Set Case Insensitivity.
.IgnoreCase = True
‘ Set global applicability.
objRegExp.Global = True
‘ Test whether the String can be compared.
If (.Test(myString) = True) Then
‘ Get the matches.
Set colMatches = .Execute(myString) ‘ Execute search.
For Each objMatch In colMatches ‘ Iterate Matches collection.
With objMatch
RetStr = RetStr & “Match found at position “
RetStr = RetStr & .FirstIndex & “. Match Value is ‘”
RetStr = RetStr & .Value & “‘.” & vbCrLf
End With
Next
Else
RetStr = “String Matching Failed”
End If
End With
TestRegExp = RetStr
End Function
That did’nt work well. I think you can read it anyway. I can never remember what the tags are for code on this site. ‘[]’ ?
doco: There’s a paragraph above the comments to remind you. It’s [ vb ] without the spaces.
Another solution is to use iserror(match(“*” & yourSearchCriteria & “*”,yourCell,0)) as a formula.
If it returns true your criteria is not found.
Hi everyone
Can someone give me some more pointers on how to make the “RegExp” Object available to me in Excel?
I went to the Microsoft website, and installed “Windows Script 5.7?. Is that the right thing? If it is, what else do I have to do to make my VBA macros recognize this object? I tried to initialize both “VBScript_RegExp_57.RegExp” and just “RegExp”, but it does not recognize either.
I’m not at all familiar with VBScript so I might be completely on the wrong track. But browsing the Microsoft website didn’t help me much…
Thanks in advance.
Emma
Emma: From VBA’s menu, Tools > References. Tick “Microsoft VBScript Regular Expressions”
[…] As Boolean = True, Optional MultiLine As Boolean = True) As Boolean ‘ Modified from http://www.dailydoseofexcel.com/archives/2005/08/13/pattern-matching/ ¬ ¬ ¬ Dim reg As New RegExp ¬ ¬ ¬ ¬ reg.IgnoreCase = IgnoreCase ¬ ¬ ¬ reg.MultiLine = MultiLine ¬ […]
All:
I have following question on the parsing and using regular expressions.
The following information need to be parsed and provide the results as follows:
Input:
123 Technology, Inc.
Access Info, LLC
First systems, llc
Level 5, Ltd
Level 3, LLC
Expected Output:
123 Technology
Access Info
First systems
Level 5
Level 3
How do I get the results in Excel using regexp or existing formulas.
Thank You in advance.
Ramesh V.
Ramesh, the formula that you need is:
=LEFT(A1,FIND(“,”,A1)-1)
(assuming the input text is at cell A1)
As a first time poster but long time reader of DDoE, many thanks to all who post wonderful solutions and ideas.
Here is a problem that I have been grappling with and hope someone can provide me with some ideas that will lead me towards a solution.
I have a model that processes real-time data and outputs to matrix table with either a Äú1Äù or Äú0.Äù The matrix table consists of eight letters across the top (ÄúAÄù to ÄúHÄù) and ten rows down (1 to 10), i.e. a table with 80 cells. As the model runs, it populates these 80 cells with either a Äú1Äù or Äú0.Äù Now I concatenate all these 1Äôs and 0Äôs into a string that might look like something like ÄúB7C3E8.Äù So as you can imagine this would imply a Äú1Äù in the ÄúBÄù column, row 7, a Äú1Äù in the ÄúCÄù column, row 3, and a Äú1Äù in the ÄúEÄù column, row 8. The rest of the cells would have Äú0Äôs.Äù
I have a library of patterns that I need to see if they are contained in the model output string. So for example, I may have a pattern called ÄúA3A7B3E9Äù and need to see if this exists within the model output string which might look like ÄúA2A3A7B1B3B6C1C8D7D8E2E9F1H3H5.Äù In this specific example, there is a pattern match because ÄúA3Äù, ÄúA7Äù, ÄúB3Äù, and ÄúE9Äù are contained within the model output string.
I have been trying to use RegMatch function, but so far I have had limited success. I can get a match in some cases, but the example above will not be picked up by the function as a match. Is what IÄôm trying to do possible with the RegMatch function or is there a better approach?
I should note that I am currently matching patterns by having two identical tables, one that is the model output and the other that is the desired pattern and then comparing the two as arrays. This actually works quite well, but as my library of search patterns has grown, I now need to loop through my library of patterns and identify the matches in a more streamlined fashion. Any help or ideas would be much appreciated.
Best regards,
Bill
@Bill Ghauri
Are your patterns always ordered in the same way… all the A’s are listed first in numerical order, then the B’s are listed next in numerical order, followed by the C’s, D’s, etc. each numerical order (as your example shows)? If yes, and if the pattern is also listed in with this ordering, then you use this function which returns True if the pattern exists and False otherwise…
Pattern As String) As Variant
Dim X As Long, WildcardPattern As String
For X = 1 To Len(Pattern) Step 2
WildcardPattern = WildcardPattern & “*” & Mid(Pattern, X, 2)
Next
IsPatternMatch = ModelOutputString Like WildcardPattern & “*”
End Function
And, using your posted pattern and model output string, you would use it like this…
If IsPatternMatch(“A2A3A7B1B3B6C1C8D7D8E2E9F1H3H5”, “A3A7B3E9”) Then
MsgBox “Yes, the pattern matches.”
Else
MsgBox “No, the pattern does not match.”
End If
End Sub
Rick,
Yes, the patterns are always ordered the same way… in alphabetical and numerical order. Your function works like a champ! It is exactly what I needed. Thank you so much.
Best regards,
Bill Ghauri
@Bill Ghauri
Sorry Bill, but there is a small problem with the function I posted… it will not work correctly if the Pattern contains any cells from the tenth row. Here is an amended function which should correct that problem…
ByVal Pattern As String) As Variant
Dim X As Long, WildcardPattern As String
Pattern = Replace(Pattern, “10”, “X”)
ModelOutputString = Replace(ModelOutputString, “10”, “X”)
For X = 1 To Len(Pattern) Step 2
WildcardPattern = WildcardPattern & “*” & Mid(Pattern, X, 2)
Next
IsPatternMatch = ModelOutputString Like WildcardPattern & “*”
End Function
My first thought is that this would be a very good situation in which to use Longre’s MOREFUNC.XLL add-in. Doing so, this problem could be reduced to formulas like
=REGEX.COUNT(string,REGEX.SUBSTITUTE(“.*”&pattern,”([A-Z]+d+)”,”[1].*”))
This could be supplemented with some initial validity checking on string and pattern.
=CHOOSE(1+REGEX.COMP(string,”^([A-Z]+d+)+$”,1)+2*REGEX.COMP(pattern,”^([A-Z]+d+)+$”,1),
-3,-2,-1,REGEX.COUNT(string,REGEX.SUBSTITUTE(“.*”&pattern,”([A-Z]+d+)”,”[1].*”)),-1)
which returns -3 if both string and pattern are invalid, -2 if only pattern is invalid, -1 if only string is invalid, 0 if valid pattern isn’t found within string, and 1 if valid pattern is found within string.
OTOH, if add-ins aren’t allowed but VBA is, more general is better than more particular. Rick’s revised UDF works given the specs, but if the specs change, e.g., range expands to row 11 or further and/or to column X or further, it fails. Myself, I’d use a state machine and have the udf perform validity checking. Something like
‘first Const for case insensitive, second for case sensitive/only upper case valid
Const LETTERS As String = “[A-Za-z]”
‘Const LETTERS As String = “[A-Z]”
Dim k As Long, st As Long, ch As String * 1, apat As String
For k = 1 To Len(pat)
ch = Mid$(pat, k, 1)
If st = 0 And ch Like LETTERS Then
apat = apat & ch
ElseIf st = 0 And ch Like “#” Then
apat = apat & ch
st = 1
ElseIf st = 1 And ch Like “#” Then
apat = apat & ch
ElseIf st = 1 And ch Like LETTERS Then
apat = apat & “*” & ch
st = 0
Else
st = 0
Exit For
End If
Next k
matchpat = IIf(st = 1, 0, -2)
st = 0
For k = 1 To Len(str)
ch = Mid$(str, k, 1)
If st = 0 And ch Like LETTERS Then
‘nothing to do
ElseIf st = 0 And ch Like “#” Then
st = 1
ElseIf st = 1 And ch Like “#” Then
‘nothing to do
ElseIf st = 1 And ch Like LETTERS Then
st = 0
Else
st = 0
Exit For
End If
Next k
matchpat = matchpat + IIf(st = 1, 0, -1)
If matchpat = 0 Then matchpat = Abs(str Like “*” & apat & “*”)
End Function
Quibble: argument ordering should be comparable to Excel’s built-in functions, e.g.,
FIND(look_FOR,look_WITHIN[,…])
SEARCH(look_FOR,look_WITHIN[,…])
MATCH(look_FOR,look_WITHIN[,…])
so
matchpat(look_FOR,look_WITHIN)
rather than
IsPatternMatch(look_WITHIN,look_FOR)
@fzz and Bill Ghauri
Responding to the comments made by fzz, this is how I would write the general case pattern search modeled after Bill’s setup; it should work for any address range in XL2003 or XL2007/2010 and beyond; that is, as long as the ordering is as Bill has defined it, it should work for any number of letters followed by any number of digits for each “address” in the pattern and model output strings.
ByVal Pattern As String) As Boolean
Dim X As Long
Do
X = X + 1
If X <= Len(Pattern) Then
If Mid(Pattern, X, 2) Like “#[a-zA-Z]” Then
Pattern = Replace(Pattern, Mid(Pattern, X, 2), Mid(Pattern, _
X, 1) & “-“ & “*” & “-“ & Mid(Pattern, X + 1, 1))
End If
End If
If X <= Len(ModelOutput) Then
If Mid(ModelOutput, X, 2) Like “#[a-zA-Z]” Then
ModelOutput = Replace(ModelOutput, Mid(ModelOutput, X, 2), _
Mid(ModelOutput, X, 1) & “-*-“ & _
Mid(ModelOutput, X + 1, 1))
End If
Else
Pattern = “*-“ & Pattern & “-*”
ModelOutput = “*-“ & ModelOutput & “-*”
Exit Do
End If
Loop
IsPatternMatch = ModelOutput Like Pattern
End Function
@fzz
Your function does not work quite right. It will return True for this function call…
matchpat(“A3A7B1234E10?, “A2A3A79B1234B3B6C1C8D7D8E10E9F123H3H5AB1234CC1ABC123?)
Using your argument names, there is an A7 in the ‘pat’ argument, but no A7 in the ‘str’ argument (it looks like is matched the first two characters from the A79 address).
Rick and fzz,
Thank you both for the solutions. Rick, in my initial testing, I did not notice the 10th row limitation at first, but I did later. Your third version of the function works very well. Since my actual model currently contains 80 rows (but it could easily be over 100 soon), I like the flexibility of your function to accommodate an expanded table range. Btw, to get the model output string in the proper order, I have a function that concatenates the columns first and then concatenates the column “totals” (i.e. top to bottom then left to right). This ensures the model output string is always in the proper order. Again, many thanks.
Best regards,
Bill Ghauri
@Rick- then better still to use MOREFUNC.XLL regular expressions.
=REGEX.COUNT(REGEX.SUBSTITUTE(string,”(d)([A-Z])”,”[1] [2]”),
“^.*”®EX.SUBSTITUTE(pattern,”([A-Z]+d+)”,”[1].*”)&”$”)
As for udfs without regular expressions, you’re right: delimiters between tokens are needed, but why
… & “-” & “*” & “-” & …
rather than
… & “-*-” & …
in the first Replace call?
@fzz…
==> then better still to use MOREFUNC.XLL regular expressions.
I still like rolling my own code… and the function I came up with is relatively short… plus, in my opinion, will be easier to read next week than the regular expression pattern will be. [grin]
==> but why Ķ & Äú-Äù & Äú*Äù & Äú-Äù & Ķ rather than
==> Ķ & Äú-*-Äù & Ķ in the first Replace call?
I originally had Chr$(1) instead of the dashes; then I did a Replace on the Chr$(1) and replaced them with “-“; then I noticed I could replace the “-“&”*”&”-” with “-*-“, which I did by hand… and, of course, I missed one of them.
Also, someone recently posted that MOREFUNC can fail in Excel 2007. Haven’t tried it myself, have never used MOREFUNC, but if that is correct it validates my decision.
I’m running into problems where Excel complains about “RegExp” being an undefined user data type.
Not sure what is going wrong, but I’m running Excel 2003 on Windows 7 64-bit, which is probably not a common setup. :(
Excel 2002, rather. Even less common… :(
Thank you for a very useful example. Although I see it was written in 2005, I still hope somebody can answer my question.
If I understand this code correctly, as soon as it find a match it will exit the loop. But what if there were several matches, and I want them all summed up. How should the loop be written then?
Example, I want all instances of New York and the number to be summed up:
Value
New York 5
New Year 6
New York 10
The result should be 15. The code example in this article will give me the number 5 and exit the loop. Correct?
Thanks in advance.
A quick helper function. You can adjust to fit:
Dim rng As Range, i As Long, j As Long, dbl As Double
reg.IgnoreCase = IgnoreCase
reg.MultiLine = MultiLine
reg.Pattern = Pattern
dbl = 0
For i = 1 To Source.Rows.Count
For j = 1 To Source.Columns.Count
If reg.test(Source(i, j).Value) Then dbl = dbl + SumRange(i, j).Value
Next
Next
RegSum = dbl
End Function
A3 = ^New York$
A6:A8 = New York, New Year, New York
B6:B8 = 5, 6, 10
Function call
=RegSum(A6:A8, A3, B6:B8)
Thank you very much, this was really helpful.
The code is looping through a list of URLs that are generated from Google Analytics using arrays.
The list is ordered from highest to lowest pageview number. Since I don’t know how many URLs the list will contain, some of the cells will show #N/A instead of he URL and pageview number.
Example:
URL Pageview
/url-1 10
/url-2 7
#N/A #N/A
This made the code not working, so I added some On Error Resume Next to fix the problem. The code is below, and it is working, but if you see anything that should be different, please let me know.
Public Function RegSum(Source As Range, Pattern As String, SumRange As Range, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Double
Dim rng As Range, i As Long, j As Long, dbl As Double
With reg
.IgnoreCase = IgnoreCase
.MultiLine = MultiLine
.Pattern = Pattern
End With
On Error Resume Next
dbl = 0
For i = 1 To Source.Rows.Count
If Err.Number = 0 Then
For j = 1 To Source.Columns.Count
If reg.Test(Source(i, j).Value) Then dbl = dbl + SumRange(i, j).Value
Next
End If
Next
RegSum = dbl
End Function
Again, thank you very much for your help.
[…] can get it to work in 2003 (We’ve got 2000, really keeping up on the features here). Take a look at Daily Dose of Excel ª Blog Archive ª Pattern Matching which seems to have a pretty easy user defined function for pattern matching. Wolffy, […]
OY!!
I’m having a really hard time doing a simple thing…. I just need to check whether the length of a number in Cell A is less or equal to 13 digits and then have Cell B’s value be one of 2 things …
PS. I haven’t used any excel since school
The documentation link in the article no longer works.
I found it here: http://msdn.microsoft.com/en-us/library/6wzad2b2(VS.85).aspx
Sir,
I USE THE DBGRID AS SHOWN BELOW. NOW I WANT TO THE SAME IN DATAREPORT.
KINDLY HELP ME WHAT IS THE CODE TO GET THE INFORMATION SHOWN IN DBGRID IN THE DATAREPORT. I USE THE FOLLOWING CODE FOR THE FORMS.
Private Sub Form_Load()
Dim dbMasterRecords As Database
Dim rsmas_rec As Recordset
Set dbMasterRecords = OpenDatabase(“c:\Program Files\MasterRecords.mdb”)
Data1.RecordSource = “select * from mas_rec”
Data1.REFRESH
Data1.Recordset.MoveLast
Label6.Caption = Data1.Recordset.RecordCount
End Sub
——————————————————————————————
Private Sub CLOSE_Click()
Unload FIND
End Sub
——————————————————————————————
Private Sub FILENO_Click()
Dim c As String
c = InputBox(“ENTER FILE NO”, “FIND”)
Data1.RecordSource = (“select * from mas_rec where rc_no='” + c + “‘”)
Data1.REFRESH
If Data1.Recordset.RecordCount = 0 Then
MsgBox (“NO RECORDS FOUND”)
Else
Data1.Recordset.MoveLast
Label5.Caption = Str(Data1.Recordset.RecordCount)
End If
End Sub
——————————————————————————————
Private Sub SECTIONNAME_Click()
Dim d As String
d = InputBox(“ENTER SECTION NAME”, “FIND”)
Data1.RecordSource = “select * from mas_rec where sec_name='” + d + “‘”
Data1.REFRESH
If Data1.Recordset.RecordCount = 0 Then
MsgBox (“NO RECORDS FOUND”)
Else
Data1.Recordset.MoveLast
Label5.Caption = Str(Data1.Recordset.RecordCount)
End If
End Sub
——————————————————————————————
Private Sub PRINTVIEW_Click()
Dim DBGrid(1) As DataReport2
Dim FILENO_Command() As Boolean
Set DBGrid(1) = DataReport2(“select * from mas_rec where rc_no”)
If DBGrid(1) = True Then
DBGrid(1) = DataReport2
Unload FIND
Show DataReport2
Else
DataReport1.Show
End If
End Sub
Hi All,
I first learnt about regular expressions when using them in Perl and checking for things like e-mail addresses were entered correctly – very powerful things I have to say.
As for Excel, I was looking for a tool that would strip any illegal characters for the cells on my sheet so I used something like this:
Dim RE As Object
Dim rng As Range
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
' Place the WANTED characters between [^ and ]
.Pattern = "[^a-zA-Z0-9\s\.\\\/-]"
For Each rng In ActiveSheet.UsedRange.SpecialCells(xlCellTypeConstants)
rng.value = .Replace(rng.value, "")
Next rng
End With
Set RE = Nothing
Seems to do the trick for me :o)
Regards,
Ken
Hi All,
I’m newbie to excel programming. Finding difficulty to achieve a pattern matching output as below.
Input Output
1234 ABCD
1123 AABC
1112 AAAB
1111 AAAA
1212 ABAB
1221 ABBA
etc…
How do I get this output in excel.
Sub pmatch()
Dim vaInput As Variant
Dim i As Long
Dim j As Long
Dim sOutput As String
vaInput = Array(1234, 1123, 1112, 1111, 1212, 1221)
For i = LBound(vaInput) To UBound(vaInput)
sOutput = ""
For j = 1 To Len(vaInput(i))
sOutput = sOutput & Chr$(64 + Mid(vaInput(i), j, 1))
Next j
Debug.Print sOutput
Next i
End Sub
Dear Dick Kusleika,
Thank you so much.
The number’s not necessariy those I shown above. It be any number that falls in 0000 to 9999. So vaInput I need to keep all these starting from “0000”
Kind Regards,
Raj
Hi
I am also new to regex and was wondering if the line starting with “if application.sheets(i)…” could be redone using regex. the sheets named “vacant”, Vacant_1″ etc could extend to “Vacant_10”
how would I work with all sheets named Vacant_something and Audit. Hope I am clear in my explanation of what I need.
Regards
Shaun
You don’t need regex for that.
If Application.Sheets(i).Name Like "Vacant_*" Or Application.Sheets(i).Name = "Audit" Then
for each sh in sheets
if instr("VacanAudit",left(sh.name,5)) Then
next
Ta thank you very much. I have another question ….
I need to find a cell address to input into a formula based on a date input via input box. The cell address will be in another worksheet. I dont know the cell address but i do know the data in that cell, I need the cell address
any help would be good
ta
Shaun