Dynamically extracting the nth numerical element without delimiters

ExcelXOR has a great post on using formulas to extract all numbers from a string where:

  • The string in question consists of a mixture of numbers, letters and special characters
  • The numbers may appear anywhere within that string
  • Decimals within the string are to be returned as such
  • The desired result is to have all numbers returned to separate cells

That’s a tall order with formulas. Here’s what ExcelXOR came up with:


=0+MID(“α”&s&”α0″,1+SMALL(IF(MMULT(0+(ABS(51.5-CODE(MID(SUBSTITUTE(“α”&s&”α0″,”/”,”α”),ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))
+{0,1},1)))>6)*{2,1},{1;1})=2,ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))),e),SUM(SMALL(IF(ISNUMBER(MATCH(MMULT(0+(ABS(51.5-CODE(
MID(SUBSTITUTE(“α”&s&”α0″,”/”,”α”),ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))+{0,1},1)))>6)*{2,1},{1;1}),{1,2},0)),ROW(INDIRECT(
“1:”&LEN(“α”&s&”α0″)-1))),2*e+{-1,0})*{-1,1}))

…where s is the string you want to break apart, and e the element you want returned.

That fatboy runs to 415 characters. Which is a heck of a lot less than my first effort:

=SUM(IFERROR(--REPT(MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(
OFFSET($A$1,,,LEN(s)))^0),(IF((ISNUMBER(-MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET(
$A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))*NOT(ISNUMBER(-MID("|"&s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,
LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)))*NOT(ISNUMBER(-MID(s&"|",1+ROW(OFFSET($A$1,,,LEN(s)))
*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))))>0,ROW(OFFSET($A$1,,,LEN(s)))
*COLUMN(OFFSET($A$1,,,,LEN(s)))^0) = SMALL(IF((ISNUMBER(-MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,
COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))*NOT(ISNUMBER(-MID("|"&s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(
OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)))*NOT(ISNUMBER(-MID(s&"|",1+ROW(OFFSET
($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))))>0,ROW(OFFSET
($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0),e))),0))

…although there is some pretty nifty stuff going on in there, that I’ll bore you with at a later date…including a MID that breaks a string down into ALL possible slices of text:
=MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)

I couldn’t rest with just that. Literally. It was too heavy. So I had another crack. The result: Here’s a generic ExtractNumbers formula I just put together. And by ‘just’ I mean I only just managed it, and it took an entire weekend, I ignored the kids, and forgot to bathe. (That last one is pretty much a given, and I can’t really blame Excel).


=MID(s,SMALL( IF( ISERROR( -MID( TEXT( MID("||"&s,ROW( OFFSET( $A$1,,,LEN( s))),2),"|"),2,1)),FALSE,ROW( OFFSET( $A$1,,,LEN( s)))),e)
-1,SUM( SMALL(IF( ISERROR( -MID( TEXT( MID("|"&s&"|",ROW( OFFSET( $A$1,,,LEN( s)+1)),2),"|"),{1,2},1)),FALSE,ROW( OFFSET( $A$1,,,LEN(
s)+1))),{1,2}+(e-1)*2)*{-1,1}))

…again, where:
S = the string you want to break apart
E = the number element you want to return

It will handle numbers with decimal places provided there is a digit to the left of the decimal place e.g. SomeText5.745 and NOT SomeText.745, and it’s a much more svelte 277 characters in length. Isn’t she a beauty? A lot of the inspiration for the approach came from Excel Ninja Sajan, over at the awesome Formula Challenges section of Chandoo’s forum.

In that incarnation, you can use it to extract just one element of a specific number:
 
Extract1
 
 

Or if you prefer, you can use this version:
=MID($A28,SMALL(IF(ISERROR(-MID(TEXT(MID(“||”&$A28,ROW(OFFSET($A$1,,,LEN($A28))),2),”|”),2,1)),FALSE,ROW(OFFSET($A$1,,,LEN
($A28)))),COLUMNS($B28:B28))-1,SUM(SMALL(IF(ISERROR(-MID(TEXT(MID(“|”&$A28&”|”,ROW(OFFSET($A$1,,,LEN($A28)+1)),2),”|”),{1,2},1)),
FALSE,ROW(OFFSET($A$1,,,LEN($A28)+1))),{1,2}+(COLUMNS($B28:B28)-1)*2)*{-1,1}))

…you can use it to extract numbers into separate columns, where $A28 holds the string to be split, and B28 holds the first column that you want to extract a number to. Like so:
 
Extract2
 
 

I don’t know how either of these perform against a UDF, let alone each other. Anyone got a lean, mean, UDF-based splitting machine that we can test it against?

Here’s a sample file:
ExtractNumber_20141120

—Edit 21 Nov 2014—
It turns out that my above formula fails for a few specific special character & number combinations. Here’s a table, showing in which cases Excel will still treat a number as a number when you pad it out with a non number. For completeness I do three tests:

  • Special character before the number
  • Special character after the number
  • Special character on either side of the number

 
special character matches

—Edit 8 Dec 2014—
Crikey…after chaining myself to the computer since the last update, I finally managed to cobble together a formula that will extract all numbers in practically any shape or form that the local version of Excel deems as a number. That is, given a string like this:
Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30
…It returns this
43
25/4/71
100000
8:30

It ain’t pretty. If you like UDFs, then you’ll agree with my pal Gareth Hayter that this is pure formulabation. Unless you like formulas, in which case it’s orgasmic. (You sick, sick analyst, you. Don’t let your boss catch you playing with it in the office…)


=MID( String,SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String,ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( COLUMN( OFFSET( $A$1,,,,LEN( String)))+ROW( OFFSET( $A$1,,,LEN( String)))-1),ROW( OFFSET( $A$1,,,LEN( String)+1))-1)=1)*ISNUMBER( -MID( "|"&String&"|",ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String)))),SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String&"|",ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( LEN( String)+1-ROW( OFFSET( $A$1,,,LEN( String)))),MOD( LEN( String)+2-ROW( OFFSET( $A$1,,,LEN( String)+1)),LEN( String)+1))=1)*ISNUMBER( -MID( "|"&String,ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String))))-SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String,ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( COLUMN( OFFSET( $A$1,,,,LEN( String)))+ROW( OFFSET( $A$1,,,LEN( String)))-1),ROW( OFFSET( $A$1,,,LEN( String)+1))-1)=1)*ISNUMBER( -MID( "|"&String&"|",ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String))))+1)

Here’ a file showing how this bit of formulabation comes together: Dynamic Split Numbers_20141208 v6

36 thoughts on “Dynamically extracting the nth numerical element without delimiters

  1. Looks like a great effort, Jeff!

    Just wondering if you can tweak it to return the numbers at the very end as well? Also, it doesn’t seem to handle all cases with special characters, for example the string:

    ;9>B=0AD</E<D@5:C7A548

    returns, consecutively: 9 0 5:C 7 #NUM!

    (P.S. Thanks a lot for all the kind words and links on your wonderful site! I recall spending whole afternoons pouring through all your articles, along with David Hager's E-Letter Archive series and several others, all of which led to the wonder I now see in Excel formulas. So I'd just like to say many, many thanks!)

  2. Hi XOR. Nice to meet you, and thanks for the kind words. I thought it was returning numbers on the end, as per the screenshot of the table above. Although now I see it trips up on your sample string:

    ';9>B=0AD
    I'm not sure why. I'll take a look, see if I can tweak it.
    
    Hey, where the heck did you come across that INDEX deferencing syntax I talked about in my previous post? I've not seen that anywhere. Inspired. I discovered something similar independently using some combination of OFFSET(OFFSET but can't recall the specifics just now.
  3. This UDF:

    Function ExtractNumerics(Source)
    Dim sIn As String
    Dim sOut() As String
    Dim lCols As Long
    Dim lStart As Long
    Dim lLen As Long
    Dim sPart As String
    Select Case TypeName(Source)
    Case "Range"
    sIn = CStr(Source.Value) & "| "
    Case "String"
    sIn = Source & "| "
    End Select
    ReDim sOut(1 To 1, 1 To 1)
    For lStart = 1 To Len(sIn)
    For lLen = 1 To Len(sIn) - lStart
    sPart = Mid(sIn, lStart, lLen)
    If IsNumeric(sPart) Then
    If Not IsNumeric(Mid(sIn, lStart, lLen + 1)) Then
    lCols = lCols + 1
    ReDim Preserve sOut(1 To 1, 1 To lCols)
    sOut(1, lCols) = CDbl(sPart)
    lStart = lStart + lLen
    Exit For
    End If
    Else
    Exit For
    End If
    Next
    Next
    ExtractNumerics = sOut
    End Function

    is about twice as fast as the formula in the block of cells starting at cell C35.

  4. c00: the string to parse
    y : the nth numerical item

    Function F_snb(c00, y) As String
    c01 = "| " & c00

    For j = Len(c01) To 2 Step -1
    If IsNumeric(Mid(c01, j, 1)) + IsNumeric(Mid(c01, j - 1, 1)) = -1 Then c01 = Left(c01, j - 1) & " " & IIf(IsNumeric(Mid(c01, j, 1)), "", "|") & Mid(c01, j)
    Next

    sn = Filter(Split(c01), "|", False)
    If y < UBound(sn) + 1 Then F_snb = sn(y) End Function

  5. @XOR LX. Have worked out why my formula is failing, and in that specific case it’s because Excel considers : to be a number when at the end of a number, as it denotes a time. I’ve added a table to the end of my post detailing a few special cases. This raises an interesting thought: perhaps your formula needs to be changed to return all these special cases as numbers. Then again, perhaps it already does…I haven’t checked yet…still to busy fixing mine!

  6. @Charles: Yeah, I saw your post on the GROUPS function. I haven’t had the time to delve at all into the FastExecl functions despite purchasing it probably over a year ago now. Largely because I’m concentrating on writing materials that will be available in the public domain. So for instance, in my book I’ll be giving FastExcel a mention, but I’ll also be providing cheaper – albeit slower – alternatives to some common issues that users have, such as this one.

    I think it’s staggering what you’ve done with FastExcel. But I can also see that many users will turn to FastExcel functions when their workbook starts going slow, when they should firsttake a look at their workbook layout, and then see if FastExcel functions are even needed for their particular situation. Because using FastExcel functions seriously limits the audience for the spreadsheet.

    In fact, Narayan put it quite well in this comment at http://forum.chandoo.org/threads/excel-can-not-handle-it.20333/#post-122661 :

    There are two aspects to workbook heaviness ; one where the workbook is a well-optimized one , but is burdened with the tasks that it has been developed to handle ; the other where the workbook has grown organically over time the way a tree develops roots viz. haphazardly and in all directions.

    A workbook is supposed to be planned well , not just for immediate needs , but anticipating what might come in the future. I leave it to you to decide into which category your workbook fits.
    However , once a workbook has assumed such dimensions as yours has , the way to optimization and speed is uncertain.

    No doubt the profiler you’ve built into FastExecl will probably do the op at that post just as much good as the FastExcel functions. This isn’t a critique of FastExcel in any way…in fact the best thing that could happen is for MS to pay you enough money so that you pass over your IP to them, and they then give it to all of us as part of the Excel bundle.

  7. @Jeff Weir

    Yes – the issue you’re seeing is the result of you performing tests on strings of length 2.

    When testing for numericalness within strings, anything other than checking on a purely character-by-character process is a risky business, and certainly not the most rigorous approach.

    A similar issue occurs with the (sadly oft-recommended) solution for extracting consecutive numbers from a string using LOOKUP, e.g. for consecutive numbers from the start of a string :

    =-LOOKUP(1,-LEFT(A1,ROW(INDIRECT(“1:”&LEN(A1)))))

    which works in 99.9% of cases, though, since it necessarily relies upon checking strings of lengths greater than one, there will always be the risk (albeit small) of encountering one of a few alphanumeric strings which will cause incorrect results, namely any appropriate mixing of letters and numbers which also happens to qualify as a valid Excel date string within the version being used, or as a valid form of scientific notation.

    See here for more:

    http://excelxor.com/2014/09/14/extracting-numbers-from-a-string-1-consecutive-numbers-at-start/

    Perhaps you can tweak it somehow, though in general it might be best to re-work it so that, as my formula does, it operates on a strictly character-by-character basis, even if this means an increase in formula length.

    Regards

  8. I completely rely on Jeff’s speed testing skills ..

    Meanwhile:

    Function F_snb(c00, y) As String
    sn = Split(StrConv(c00, 64), Chr(0))

    For j = 0 To UBound(sn)
    If Not IsNumeric(sn(j)) Then
    sn(j) = ""
    If j > 0 Then If IsNumeric(sn(j - 1)) Then sn(j - 1) = sn(j - 1) & "~"
    End If
    Next
    sn = Split(Join(sn, ""), "~")

    If y < = UBound(sn) Then F_snb = sn(y - 1) End Function

  9. @ Jeff,

    Could you please replace my latest contribution by:

    Function F_snb(c00, y) As String
    On Error Resume Next

    For j = 1 To Len(c00)
    If Not IsNumeric(Mid(c00, j, 1)) Then Mid(c00, j, 1) = " "
    Next

    F_snb = Split(Application.Trim(c00))(y - 1)
    End Function

  10. Even a 1-liner does the trick:

    Function F_snb(c00, y) As String
    On Error Resume Next
    F_snb = Split(Application.Trim(Join(Evaluate("transpose(If(isnumber(-mid(""" & c00 & """,row(offset(A1,,," & Len(c00) & ")),1)),mid(""" & c00 & """,row(offset(A1,,," & Len(c00) & ")),1),"" ""))"), "")))(y - 1)
    End Function

  11. @XOR LX: One possiblity for restricting numeric interpretations is to append something to the string, for instance:

    =-LOOKUP(1,-(LEFT(A1,ROW(INDIRECT("1:"&LEN(A1))))&"**0"))

    will only allow for decimals (not dates, times, fractions or exponentials). Another possibility that only allow for integer values is to append instead &” 0/1″.

    @jeff: Clever use of TEXT for identifying numeric pairs, i will make a note of that and then likely forget it. But I really think it’s hard to do any better than Excelxor’s solution which appears to be quite robust and clearly a lot of thought has gone into it. Testing the solutions with the following string “1.2.3″ shows up some differences:

    jeff: 1.2.3
    xor lx: #NUM!
    jkp: {1.2,3}
    snb: {1,2,3}

    There’s a good case to be made for throwing where there is ambiguity, also it looks like snb’s may need adapting to accept decimals.

  12. @Lori

    Ingenious. How does Excel interpret e.g. the string “123**0” as 123?

    On a side note, did anyone know that two (not one, or three) consecutive asterisks, appropriately-placed, will perform the equivalent role of the letter “E” (or “e”) in generating scientific notation?

    For example, typing both:

    “1**3”

    and

    “1E3”

    will result in a (numerical) cell value of 1000.

    Curiouser and curiouser!

    Cheers

  13. Using “e0” allows for misinterpretation eg in your example “123jun”. “E+0” might be ok, but the “E” might also be interpreted differently in different locales, so i thought “**0” was safer.

  14. i don’t know where the ** syntax comes from, perhaps it’s there for Lotus 123 compatibility? i haven’t seen it documented either but i’m fairly certain i came across it in exactly these circumstances where a text string had some numbers and asterisks that were misinterpreted in the data cleansing process. The cause took some time to track down, and in the end i came up with the idea of using the same syntax to guard against it.

    Of course it depends on your data, but i would definitely not trust the numeric parser on a random string for reasons like this and the naive formulas posted in forums should not be advocated. Even if the formula were 99.9% effective, on data of over a thousand rows you’d expect it to be unreliable. I’d think it was less than that, for instance simple strings such as “1 a” and “1 p” get converted to 0.041666667 (1am) and 0.541666667 (1pm).

  15. Longre’s MOREFUNC.XLL can still be found on the web. The REGEX.MID function it provides makes this fairly easy. The regular expression to find anything which could be interpreted as a number (without E+/- scaling) is [-+]?(\d+(\.\d+)?|\.\d+).

  16. I had a UDF using regular expressions that extracted a number from anywhere in a text string:
    http://newtonexcelbach.wordpress.com/2014/04/12/extracting-numbers-with-regular-expressions/
    The original code came from Winston Snyder at:
    http://dataprose.org/2013/12/scrub-your-data-expressively/

    Prompted by this thread I have now modified it to extract all the separate values in a text string and return all of them as an array, or any specified value. It also allows any character to be specified as the decimal separator (default .) and any character to be the thousands separator, which is ignored (default ,); so for “a 12,345.678 or 876,543.12, 1” it returns 12345.678 876543.12 1.

    Here is the code


    Function ExtractNums(NumStrings As Variant, Optional Position As Long = 0, Optional MaxNum As Long = 10, _
    Optional DecString As String = ".", Optional IgnoreString As String = ",") As Variant

    'Based on code by Winston Snyder: http://dataprose.org/2013/12/scrub-your-data-expressively/
    'Extract one or more numbers from anywhere in an input string, UDF Version

    Dim RE As Object, Match As Object, Matches As Object, strNumber As String, i As Long, j As Long, k As Long
    Dim numrows As Long, Row As Long, NumString As String, OutA() As Variant, NumMatches As Long
    Dim NumChar As Long, PrevMatch As Boolean, Digit As String, REPatt As String

    'Initialize variables
    Set RE = CreateObject("VBScript.RegExp")

    If TypeName(NumStrings) = "Range" Then NumStrings = NumStrings.Value2
    If TypeName(NumStrings) = "String" Then
    numrows = 1
    Else
    numrows = UBound(NumStrings)
    End If

    ReDim OutA(1 To numrows, 1 To MaxNum)
    On Error Resume Next
    For Row = 1 To numrows
    If IsArray(NumStrings) = True Then
    NumString = NumStrings(Row, 1)
    Else
    NumString = NumStrings
    End If

    'Criteria for Regular Expression
    REPatt = "[0-9" & DecString & "]"
    With RE
    .Pattern = REPatt '"[0-9.]"
    .Global = True
    Set Matches = .Execute(NumString)
    End With

    'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string
    strNumber = ""
    NumMatches = Matches.Count
    NumChar = Len(NumString)
    j = 0
    k = 1
    PrevMatch = False
    For i = 1 To NumChar
    Digit = Matches(j)
    If Mid(NumString, i, 1) <> IgnoreString Then
    If Digit = Mid(NumString, i, 1) Then
    strNumber = strNumber + Matches(j)
    j = j + 1
    PrevMatch = True
    ElseIf PrevMatch = True Then
    If Position = 0 Then
    OutA(Row, k) = CDbl(strNumber)
    ElseIf Position = k Then
    OutA(Row, 1) = CDbl(strNumber)
    Exit For
    End If
    strNumber = ""
    k = k + 1
    PrevMatch = False
    End If
    End If
    Next i
    If PrevMatch = True Then
    If Position = 0 Then
    OutA(Row, k) = CDbl(strNumber)
    ElseIf Position = k Then
    OutA(Row, 1) = CDbl(strNumber)
    End If
    End If
    Next Row
    ExtractNums = OutA

    'Tidy up
    'Destroy objects
    Set Matches = Nothing
    Set RE = Nothing
    End Function

  17. Revised code that will:
    – Recognise numbers starting with + or –
    – Recognise numbers in E notation (with upper or lower case E)
    – Return #NA if the requested number index does not exist
    – Not treat E or e as part of a number unless it is preceded by a digit


    Function ExtractNums(NumStrings As Variant, Optional Position As Long = 0, Optional MaxNum As Long = 10, _
    Optional DecString As String = ".", Optional IgnoreString As String = ",") As Variant

    'Based on code by Winston Snyder: http://dataprose.org/2013/12/scrub-your-data-expressively/
    'Extract one or more numbers from anywhere in an input string, UDF Version

    Dim RE As Object, Match As Object, Matches As Object, strNumber As String, i As Long, j As Long, k As Long
    Dim numrows As Long, Row As Long, NumString As String, MatchString As String, OutA() As Variant, NumMatches As Long
    Dim NumChar As Long, PrevMatch As Boolean, Digit As String, REPatt As String
    Const ExpString As String = "E", pmString As String = "+-"

    'Initialize variables
    Set RE = CreateObject("VBScript.RegExp")

    If TypeName(NumStrings) = "Range" Then NumStrings = NumStrings.Value2
    If TypeName(NumStrings) = "String" Then
    numrows = 1
    Else
    numrows = UBound(NumStrings)
    End If

    ReDim OutA(1 To numrows, 1 To MaxNum)
    On Error Resume Next
    For Row = 1 To numrows
    If IsArray(NumStrings) = True Then
    NumString = NumStrings(Row, 1)
    Else
    NumString = NumStrings
    End If

    'Criteria for Regular Expression
    REPatt = "[0-9" & DecString & ExpString & LCase(ExpString) & pmString & "]"
    With RE
    .Pattern = REPatt
    .Global = True
    Set Matches = .Execute(NumString)
    End With

    'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string
    strNumber = ""
    NumMatches = Matches.Count
    NumChar = Len(NumString)
    j = 0
    k = 1
    PrevMatch = False
    For i = 1 To NumChar
    MatchString = Mid(NumString, i, 1)
    If PrevMatch = False And UCase(MatchString) = ExpString Then
    j = j + 1
    ElseIf MatchString <> IgnoreString Then
    Digit = Matches(j)
    If Digit = MatchString Then
    strNumber = strNumber + Matches(j)
    j = j + 1
    PrevMatch = True
    ElseIf PrevMatch = True Then
    If Position = 0 Then
    OutA(Row, k) = CDbl(strNumber)
    ElseIf Position = k Then
    OutA(Row, 1) = CDbl(strNumber)
    Exit For
    End If
    strNumber = ""
    k = k + 1
    PrevMatch = False
    End If
    End If
    Next i
    If PrevMatch = True Then
    If Position = 0 Then
    OutA(Row, k) = CDbl(strNumber)
    ElseIf k = Position Then
    OutA(Row, 1) = CDbl(strNumber)
    End If
    End If
    If k < Position Then OutA(Row, 1) = CVErr(xlErrNA) Next Row ExtractNums = OutA 'Tidy up 'Destroy objects Set Matches = Nothing Set RE = Nothing End Function

  18. Crikey…after chaining myself to the computer since the last update, I finally managed to cobble together a formula that will extract all numbers in practically any shape or form that the local version of Excel deems as a number. That is, given a string like this:
    Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30
    …It returns this
    43
    25/4/71
    100000
    8:30

    It ain’t pretty. If you like UDFs, then you’ll agree with my pal Gareth Hayter that this is pure formulabation. Unless you like formulas, in which case it’s orgasmic. (You sick, sick analyst, you. Don’t let your boss catch you playing with it in the office…)

    Link to sample file in the original post above.

  19. In this parat of the world your details would have to be written as:

    43 DOB: 25-04-1971 Salary: €100.000,- StartTime: 8.30

    Can you please adapt your formula to encompass these ‘numbers’ too ?

  20. Hi snb. You’re in luck: in my sample file you’l note I pre-clean the string with a formula like this:
    =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,” “,”|”),”$”,””),”,”,””)

    I think all you need to do is change it to this:
    =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,” “,”|”),”€”,””),”.”,””)

    …although further tweaks might be necessary, which I’ll leave to you.

    Note that this ‘pre-cleaning’ could just as easily be performed within the main formula itself. I just thought it was simpler to keep this outside the main formula…

  21. snb: I wouldn’t use it either. Hence my “formualbation” comment. Let me be clear with my intent here: to see if I could do something for the sake of it. To quote a fellow New Zealander, I did it to see if I could “Knock the bastard off”. (Aside: That quote comes from Edmond Hillary, who climbed Everest “because it was there”.) To me, formula challenges are a bit like that: a pointless waste of time really, apart from the challenge.

    That said, here’s where I feel duty bound to point out that at least my unnecessary overcomplicated approach actually handles this “simple task”. The original task was to split out numbers including decimals. The revised task of that last formula I posted was to extract things that Excel recognizes as numbers be they integers, decimals, dates, or times. Unfortunately your UDFs do neither.

    I’ll be posting my own UDF in due course. In fact, the main reason for me doing this is to illustrate in my book that a UDF can often be a much better approach than a complicated formula. And a well-optimized UDF can often thrash the pants off a poorly optimized one.

    That said, I live in the world of SharePoint Server, where VBA is not an option. So I have to rely on formulas to do things all the time that would otherwise be much better suited to a UDF.

  22. You don’t need to convince me:

    Sub M_snb()
    c00 = "Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30"

    For j = 1 To Len(c00)
    If InStr("0123456789/,:", Mid(c00, j, 1)) = 0 Then Mid(c00, j, 1) = " "
    Next

    sn = Split(Application.Trim(c00))
    For j = 0 To UBound(sn)
    If Val(sn(j)) = 0 Then sn(j) = ""
    Next

    MsgBox Join(Split(Application.Trim(Join(sn))), vbLf)
    End Sub

    Isn't it time to emigrate to non Sharepoint country ?

  23. There’s not too many jobs around where I can work from home in my Pyjamas. This SharePoint gig is one of them.

    Your sub is getting there, but doesn’t handle decimal points. While we’re at it, let’s throw in some scientific notation such as 1E3 or 1**3 just to give you a UDF Everest to climb. I’ll keep adding edge cases until a) we reach the summit or b) you tell me to climb it myself.

Leave a Reply

Your email address will not be published. Required fields are marked *