ExcelXOR has a great post on using formulas to extract all numbers from a string where:

- The string in question consists of a mixture of numbers, letters and special characters
- The numbers may appear anywhere within that string
- Decimals within the string are to be returned as such
- The desired result is to have all numbers returned to separate cells

That’s a tall order with formulas. Here’s what ExcelXOR came up with:

=0+MID(“α”&s&”α0″,1+SMALL(IF(MMULT(0+(ABS(51.5-CODE(MID(SUBSTITUTE(“α”&s&”α0″,”/”,”α”),ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))

+{0,1},1)))>6)*{2,1},{1;1})=2,ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))),e),SUM(SMALL(IF(ISNUMBER(MATCH(MMULT(0+(ABS(51.5-CODE(

MID(SUBSTITUTE(“α”&s&”α0″,”/”,”α”),ROW(INDIRECT(“1:”&LEN(“α”&s&”α0″)-1))+{0,1},1)))>6)*{2,1},{1;1}),{1,2},0)),ROW(INDIRECT(

“1:”&LEN(“α”&s&”α0″)-1))),2*e+{-1,0})*{-1,1}))

…where s is the string you want to break apart, and e the element you want returned.

That fatboy runs to 415 characters. Which is a *heck* of a lot less than *my* first effort:

=SUM(IFERROR(--REPT(MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(

OFFSET($A$1,,,LEN(s)))^0),(IF((ISNUMBER(-MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET(

$A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))*NOT(ISNUMBER(-MID("|"&s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,

LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)))*NOT(ISNUMBER(-MID(s&"|",1+ROW(OFFSET($A$1,,,LEN(s)))

*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))))>0,ROW(OFFSET($A$1,,,LEN(s)))

*COLUMN(OFFSET($A$1,,,,LEN(s)))^0) = SMALL(IF((ISNUMBER(-MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,

COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))*NOT(ISNUMBER(-MID("|"&s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(

OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)))*NOT(ISNUMBER(-MID(s&"|",1+ROW(OFFSET

($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0))))>0,ROW(OFFSET

($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0),e))),0))

…although there is some pretty nifty stuff going on in there, that I’ll bore you with at a later date…including a MID that breaks a string down into ALL possible slices of text:

`=MID(s&"|",ROW(OFFSET($A$1,,,LEN(s)))*COLUMN(OFFSET($A$1,,,,LEN(s)))^0,COLUMN(OFFSET($A$1,,,,LEN(s)))*ROW(OFFSET($A$1,,,LEN(s)))^0)`

I couldn’t rest with just that. Literally. It was too heavy. So I had another crack. The result: Here’s a generic ExtractNumbers formula I just put together. And by ‘just’ I mean I only just managed it, and it took an entire weekend, I ignored the kids, and forgot to bathe. (That last one is pretty much a given, and I can’t really blame Excel).

=MID(s,SMALL( IF( ISERROR( -MID( TEXT( MID("||"&s,ROW( OFFSET( $A$1,,,LEN( s))),2),"|"),2,1)),FALSE,ROW( OFFSET( $A$1,,,LEN( s)))),e)

-1,SUM( SMALL(IF( ISERROR( -MID( TEXT( MID("|"&s&"|",ROW( OFFSET( $A$1,,,LEN( s)+1)),2),"|"),{1,2},1)),FALSE,ROW( OFFSET( $A$1,,,LEN(

s)+1))),{1,2}+(e-1)*2)*{-1,1}))

…again, where:

S = the string you want to break apart

E = the number element you want to return

It will handle numbers with decimal places provided there is a digit to the left of the decimal place e.g. SomeText5.745 and NOT SomeText.745, and it’s a much more svelte 277 characters in length. Isn’t she a beauty? A lot of the inspiration for the approach came from Excel Ninja Sajan, over at the awesome Formula Challenges section of Chandoo’s forum.

In that incarnation, you can use it to extract just one element of a specific number:

Or if you prefer, you can use this version:

`=MID($A28,SMALL(IF(ISERROR(-MID(TEXT(MID(“||”&$A28,ROW(OFFSET($A$1,,,LEN($A28))),2),”|”),2,1)),FALSE,ROW(OFFSET($A$1,,,LEN`

($A28)))),COLUMNS($B28:B28))-1,SUM(SMALL(IF(ISERROR(-MID(TEXT(MID(“|”&$A28&”|”,ROW(OFFSET($A$1,,,LEN($A28)+1)),2),”|”),{1,2},1)),

FALSE,ROW(OFFSET($A$1,,,LEN($A28)+1))),{1,2}+(COLUMNS($B28:B28)-1)*2)*{-1,1}))

…you can use it to extract numbers into separate columns, where $A28 holds the string to be split, and B28 holds the first column that you want to extract a number to. Like so:

I don’t know how either of these perform against a UDF, let alone each other. Anyone got a lean, mean, UDF-based splitting machine that we can test it against?

Here’s a sample file:

ExtractNumber_20141120

—Edit 21 Nov 2014—

It turns out that my above formula fails for a few specific special character & number combinations. Here’s a table, showing in which cases Excel will still treat a number as a number when you pad it out with a non number. For completeness I do three tests:

- Special character before the number
- Special character after the number
- Special character on either side of the number

—Edit 8 Dec 2014—

Crikey…after chaining myself to the computer since the last update, I finally managed to cobble together a formula that will extract all numbers in practically any shape or form that the local version of Excel deems as a number. That is, given a string like this:

Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30

…It returns this

43

25/4/71

100000

8:30

It ain’t pretty. If you like UDFs, then you’ll agree with my pal Gareth Hayter that this is *pure formulabation*. Unless you like formulas, in which case it’s orgasmic. (You sick, sick analyst, you. Don’t let your boss catch you playing with it in the office…)

=MID( String,SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String,ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( COLUMN( OFFSET( $A$1,,,,LEN( String)))+ROW( OFFSET( $A$1,,,LEN( String)))-1),ROW( OFFSET( $A$1,,,LEN( String)+1))-1)=1)*ISNUMBER( -MID( "|"&String&"|",ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String)))),SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String&"|",ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( LEN( String)+1-ROW( OFFSET( $A$1,,,LEN( String)))),MOD( LEN( String)+2-ROW( OFFSET( $A$1,,,LEN( String)+1)),LEN( String)+1))=1)*ISNUMBER( -MID( "|"&String,ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String))))-SMALL( IF( ( FREQUENCY( ISNUMBER( -MID( String,ROW( OFFSET( $A$1,,,LEN( String)))*COLUMN( OFFSET( $A$1,,,,LEN( String)))^0,COLUMN( OFFSET( $A$1,,,,LEN( String)))*ROW( OFFSET( $A$1,,,LEN( String)))^0))*( COLUMN( OFFSET( $A$1,,,,LEN( String)))+ROW( OFFSET( $A$1,,,LEN( String)))-1),ROW( OFFSET( $A$1,,,LEN( String)+1))-1)=1)*ISNUMBER( -MID( "|"&String&"|",ROW( OFFSET( $A$1,,,LEN( String)+2)),1))=1,ROW( OFFSET( $A$1,,,LEN( String)+2))-1,FALSE),ROW( OFFSET( $A$1,,,LEN( String))))+1)

Here’ a file showing how this bit of formulabation comes together: Dynamic Split Numbers_20141208 v6

Looks like a great effort, Jeff!

Just wondering if you can tweak it to return the numbers at the very end as well? Also, it doesn’t seem to handle all cases with special characters, for example the string:

;9>B=0AD</E<D@5:C7A548

returns, consecutively: 9 0 5:C 7 #NUM!

(P.S. Thanks a lot for all the kind words and links on your wonderful site! I recall spending whole afternoons pouring through all your articles, along with David Hager's E-Letter Archive series and several others, all of which led to the wonder I now see in Excel formulas. So I'd just like to say many, many thanks!)

Hi XOR. Nice to meet you, and thanks for the kind words. I thought it was returning numbers on the end, as per the screenshot of the table above. Although now I see it trips up on your sample string:

Try the GROUPS function in FastExcel V3

See http://fastexcel.wordpress.com/2014/09/30/extracting-digits-from-text-using-formulas-and-designing-a-missing-excel-function-groups/

Its designed to find and extract groups of characters (which can be digits) from strings.

Looks about 4 times faster than your formulas and handles the special characters etc

This UDF:

`Function ExtractNumerics(Source)`

Dim sIn As String

Dim sOut() As String

Dim lCols As Long

Dim lStart As Long

Dim lLen As Long

Dim sPart As String

Select Case TypeName(Source)

Case "Range"

sIn = CStr(Source.Value) & "| "

Case "String"

sIn = Source & "| "

End Select

ReDim sOut(1 To 1, 1 To 1)

For lStart = 1 To Len(sIn)

For lLen = 1 To Len(sIn) - lStart

sPart = Mid(sIn, lStart, lLen)

If IsNumeric(sPart) Then

If Not IsNumeric(Mid(sIn, lStart, lLen + 1)) Then

lCols = lCols + 1

ReDim Preserve sOut(1 To 1, 1 To lCols)

sOut(1, lCols) = CDbl(sPart)

lStart = lStart + lLen

Exit For

End If

Else

Exit For

End If

Next

Next

ExtractNumerics = sOut

End Function

`is about twice as fast as the formula in the block of cells starting at cell C35.`

c00: the string to parse

y : the nth numerical item

`Function F_snb(c00, y) As String`

c01 = "| " & c00

For j = Len(c01) To 2 Step -1

If IsNumeric(Mid(c01, j, 1)) + IsNumeric(Mid(c01, j - 1, 1)) = -1 Then c01 = Left(c01, j - 1) & " " & IIf(IsNumeric(Mid(c01, j, 1)), "", "|") & Mid(c01, j)

Next

`sn = Filter(Split(c01), "|", False)`

If y < UBound(sn) + 1 Then F_snb = sn(y) End Function

@snb: so how does yours compare to the Excel formulas regarding calc speed?

@XOR LX. Have worked out why my formula is failing, and in that specific case it’s because Excel considers : to be a number when at the end of a number, as it denotes a time. I’ve added a table to the end of my post detailing a few special cases. This raises an interesting thought: perhaps your formula needs to be changed to return all these special cases as numbers. Then again, perhaps it already does…I haven’t checked yet…still to busy fixing mine!

@Charles: Yeah, I saw your post on the GROUPS function. I haven’t had the time to delve at all into the FastExecl functions despite purchasing it probably over a year ago now. Largely because I’m concentrating on writing materials that will be available in the public domain. So for instance, in my book I’ll be giving FastExcel a mention, but I’ll also be providing cheaper – albeit slower – alternatives to some common issues that users have, such as this one.

I think it’s staggering what you’ve done with FastExcel. But I can also see that many users will turn to FastExcel functions when their workbook starts going slow, when they should

firsttake a look at their workbook layout, andthensee if FastExcel functions are even needed for their particular situation. Because using FastExcel functions seriously limits the audience for the spreadsheet.In fact, Narayan put it quite well in this comment at http://forum.chandoo.org/threads/excel-can-not-handle-it.20333/#post-122661 :

No doubt the profiler you’ve built into FastExecl will probably do the op at that post just as much good as the FastExcel functions. This isn’t a critique of FastExcel in any way…in fact the best thing that could happen is for MS to pay you enough money so that you pass over your IP to them, and they then give it to all of us as part of the Excel bundle.

@Jeff Weir

Yes – the issue you’re seeing is the result of you performing tests on strings of length 2.

When testing for numericalness within strings, anything other than checking on a purely character-by-character process is a risky business, and certainly not the most rigorous approach.

A similar issue occurs with the (sadly oft-recommended) solution for extracting consecutive numbers from a string using LOOKUP, e.g. for consecutive numbers from the start of a string :

=-LOOKUP(1,-LEFT(A1,ROW(INDIRECT(“1:”&LEN(A1)))))

which works in 99.9% of cases, though, since it necessarily relies upon checking strings of lengths greater than one, there will always be the risk (albeit small) of encountering one of a few alphanumeric strings which will cause incorrect results, namely any appropriate mixing of letters and numbers which also happens to qualify as a valid Excel date string within the version being used, or as a valid form of scientific notation.

See here for more:

http://excelxor.com/2014/09/14/extracting-numbers-from-a-string-1-consecutive-numbers-at-start/

Perhaps you can tweak it somehow, though in general it might be best to re-work it so that, as my formula does, it operates on a strictly character-by-character basis, even if this means an increase in formula length.

Regards

Sorry! Forgot Code tags!!

I completely rely on Jeff’s speed testing skills ..

Meanwhile:

`Function F_snb(c00, y) As String`

sn = Split(StrConv(c00, 64), Chr(0))

For j = 0 To UBound(sn)

If Not IsNumeric(sn(j)) Then

sn(j) = ""

If j > 0 Then If IsNumeric(sn(j - 1)) Then sn(j - 1) = sn(j - 1) & "~"

End If

Next

sn = Split(Join(sn, ""), "~")

`If y <= UBound(sn) Then F_snb = sn(y - 1) End Function`

@ Jeff,

Could you please replace my latest contribution by:

`Function F_snb(c00, y) As String`

On Error Resume Next

For j = 1 To Len(c00)

If Not IsNumeric(Mid(c00, j, 1)) Then Mid(c00, j, 1) = " "

Next

`F_snb = Split(Application.Trim(c00))(y - 1)`

End Function

Even a 1-liner does the trick:

`Function F_snb(c00, y) As String`

On Error Resume Next

F_snb = Split(Application.Trim(Join(Evaluate("transpose(If(isnumber(-mid(""" & c00 & """,row(offset(A1,,," & Len(c00) & ")),1)),mid(""" & c00 & """,row(offset(A1,,," & Len(c00) & ")),1),"" ""))"), "")))(y - 1)

End Function

@XOR LX: One possiblity for restricting numeric interpretations is to append something to the string, for instance:

will only allow for decimals (not dates, times, fractions or exponentials). Another possibility that only allow for integer values is to append instead &” 0/1″.

@jeff: Clever use of TEXT for identifying numeric pairs, i will make a note of that and then likely forget it. But I really think it’s hard to do any better than Excelxor’s solution which appears to be quite robust and clearly a lot of thought has gone into it. Testing the solutions with the following string “1.2.3″ shows up some differences:

jeff: 1.2.3

xor lx: #NUM!

jkp: {1.2,3}

snb: {1,2,3}

There’s a good case to be made for throwing where there is ambiguity, also it looks like snb’s may need adapting to accept decimals.

@Lori

Ingenious. How does Excel interpret e.g. the string “123**0” as 123?

On a side note, did anyone know that two (not one, or three) consecutive asterisks, appropriately-placed, will perform the equivalent role of the letter “E” (or “e”) in generating scientific notation?

For example, typing both:

“1**3”

and

“1E3”

will result in a (numerical) cell value of 1000.

Curiouser and curiouser!

Cheers

Ah. Stupidity extreme. I believe I answered my own question!

Apologies. More coffee needed!

Using “e0” allows for misinterpretation eg in your example “123jun”. “E+0” might be ok, but the “E” might also be interpreted differently in different locales, so i thought “**0” was safer.

So it is a known feature, this syntax? Never seen it documented, myself.

Wonderful in any case!

Not known to me. Until now.

@snb How is a 201 character ‘one-liner’ helpful?

i don’t know where the ** syntax comes from, perhaps it’s there for Lotus 123 compatibility? i haven’t seen it documented either but i’m fairly certain i came across it in exactly these circumstances where a text string had some numbers and asterisks that were misinterpreted in the data cleansing process. The cause took some time to track down, and in the end i came up with the idea of using the same syntax to guard against it.

Of course it depends on your data, but i would definitely not trust the numeric parser on a random string for reasons like this and the naive formulas posted in forums should not be advocated. Even if the formula were 99.9% effective, on data of over a thousand rows you’d expect it to be unreliable. I’d think it was less than that, for instance simple strings such as “1 a” and “1 p” get converted to 0.041666667 (1am) and 0.541666667 (1pm).

Longre’s MOREFUNC.XLL can still be found on the web. The REGEX.MID function it provides makes this fairly easy. The regular expression to find anything which could be interpreted as a number (without E+/- scaling) is

[-+]?(\d+(\.\d+)?|\.\d+).I had a UDF using regular expressions that extracted a number from anywhere in a text string:

http://newtonexcelbach.wordpress.com/2014/04/12/extracting-numbers-with-regular-expressions/

The original code came from Winston Snyder at:

http://dataprose.org/2013/12/scrub-your-data-expressively/

Prompted by this thread I have now modified it to extract all the separate values in a text string and return all of them as an array, or any specified value. It also allows any character to be specified as the decimal separator (default .) and any character to be the thousands separator, which is ignored (default ,); so for “a 12,345.678 or 876,543.12, 1” it returns 12345.678 876543.12 1.

Here is the code

Function ExtractNums(NumStrings As Variant, Optional Position As Long = 0, Optional MaxNum As Long = 10, _

Optional DecString As String = ".", Optional IgnoreString As String = ",") As Variant

'Based on code by Winston Snyder: http://dataprose.org/2013/12/scrub-your-data-expressively/

'Extract one or more numbers from anywhere in an input string, UDF Version

Dim RE As Object, Match As Object, Matches As Object, strNumber As String, i As Long, j As Long, k As Long

Dim numrows As Long, Row As Long, NumString As String, OutA() As Variant, NumMatches As Long

Dim NumChar As Long, PrevMatch As Boolean, Digit As String, REPatt As String

'Initialize variables

Set RE = CreateObject("VBScript.RegExp")

If TypeName(NumStrings) = "Range" Then NumStrings = NumStrings.Value2

If TypeName(NumStrings) = "String" Then

numrows = 1

Else

numrows = UBound(NumStrings)

End If

ReDim OutA(1 To numrows, 1 To MaxNum)

On Error Resume Next

For Row = 1 To numrows

If IsArray(NumStrings) = True Then

NumString = NumStrings(Row, 1)

Else

NumString = NumStrings

End If

'Criteria for Regular Expression

REPatt = "[0-9" & DecString & "]"

With RE

.Pattern = REPatt '"[0-9.]"

.Global = True

Set Matches = .Execute(NumString)

End With

'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string

strNumber = ""

NumMatches = Matches.Count

NumChar = Len(NumString)

j = 0

k = 1

PrevMatch = False

For i = 1 To NumChar

Digit = Matches(j)

If Mid(NumString, i, 1) <> IgnoreString Then

If Digit = Mid(NumString, i, 1) Then

strNumber = strNumber + Matches(j)

j = j + 1

PrevMatch = True

ElseIf PrevMatch = True Then

If Position = 0 Then

OutA(Row, k) = CDbl(strNumber)

ElseIf Position = k Then

OutA(Row, 1) = CDbl(strNumber)

Exit For

End If

strNumber = ""

k = k + 1

PrevMatch = False

End If

End If

Next i

If PrevMatch = True Then

If Position = 0 Then

OutA(Row, k) = CDbl(strNumber)

ElseIf Position = k Then

OutA(Row, 1) = CDbl(strNumber)

End If

End If

Next Row

ExtractNums = OutA

`'Tidy up`

'Destroy objects

Set Matches = Nothing

Set RE = Nothing

End Function

Revised code that will:

– Recognise numbers starting with + or –

– Recognise numbers in E notation (with upper or lower case E)

– Return #NA if the requested number index does not exist

– Not treat E or e as part of a number unless it is preceded by a digit

Function ExtractNums(NumStrings As Variant, Optional Position As Long = 0, Optional MaxNum As Long = 10, _

Optional DecString As String = ".", Optional IgnoreString As String = ",") As Variant

'Based on code by Winston Snyder: http://dataprose.org/2013/12/scrub-your-data-expressively/

'Extract one or more numbers from anywhere in an input string, UDF Version

Dim RE As Object, Match As Object, Matches As Object, strNumber As String, i As Long, j As Long, k As Long

Dim numrows As Long, Row As Long, NumString As String, MatchString As String, OutA() As Variant, NumMatches As Long

Dim NumChar As Long, PrevMatch As Boolean, Digit As String, REPatt As String

Const ExpString As String = "E", pmString As String = "+-"

'Initialize variables

Set RE = CreateObject("VBScript.RegExp")

If TypeName(NumStrings) = "Range" Then NumStrings = NumStrings.Value2

If TypeName(NumStrings) = "String" Then

numrows = 1

Else

numrows = UBound(NumStrings)

End If

ReDim OutA(1 To numrows, 1 To MaxNum)

On Error Resume Next

For Row = 1 To numrows

If IsArray(NumStrings) = True Then

NumString = NumStrings(Row, 1)

Else

NumString = NumStrings

End If

'Criteria for Regular Expression

REPatt = "[0-9" & DecString & ExpString & LCase(ExpString) & pmString & "]"

With RE

.Pattern = REPatt

.Global = True

Set Matches = .Execute(NumString)

End With

`'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string`

strNumber = ""

NumMatches = Matches.Count

NumChar = Len(NumString)

j = 0

k = 1

PrevMatch = False

For i = 1 To NumChar

MatchString = Mid(NumString, i, 1)

If PrevMatch = False And UCase(MatchString) = ExpString Then

j = j + 1

ElseIf MatchString <> IgnoreString Then

Digit = Matches(j)

If Digit = MatchString Then

strNumber = strNumber + Matches(j)

j = j + 1

PrevMatch = True

ElseIf PrevMatch = True Then

If Position = 0 Then

OutA(Row, k) = CDbl(strNumber)

ElseIf Position = k Then

OutA(Row, 1) = CDbl(strNumber)

Exit For

End If

strNumber = ""

k = k + 1

PrevMatch = False

End If

End If

Next i

If PrevMatch = True Then

If Position = 0 Then

OutA(Row, k) = CDbl(strNumber)

ElseIf k = Position Then

OutA(Row, 1) = CDbl(strNumber)

End If

End If

If k < Position Then OutA(Row, 1) = CVErr(xlErrNA) Next Row ExtractNums = OutA 'Tidy up 'Destroy objects Set Matches = Nothing Set RE = Nothing End Function

Thanks all for the code…I’ll try to find some time to do some speed testing soon.

Crikey…after chaining myself to the computer since the last update, I finally managed to cobble together a formula that will extract all numbers in practically any shape or form that the local version of Excel deems as a number. That is, given a string like this:

Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30

…It returns this

43

25/4/71

100000

8:30

It ain’t pretty. If you like UDFs, then you’ll agree with my pal Gareth Hayter that this is

pure formulabation. Unless you like formulas, in which case it’s orgasmic. (You sick, sick analyst, you. Don’t let your boss catch you playing with it in the office…)Link to sample file in the original post above.

In this parat of the world your details would have to be written as:

43 DOB: 25-04-1971 Salary: €100.000,- StartTime: 8.30

Can you please adapt your formula to encompass these ‘numbers’ too ?

Hi snb. You’re in luck: in my sample file you’l note I pre-clean the string with a formula like this:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,” “,”|”),”$”,””),”,”,””)

I

thinkall you need to do is change it to this:=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,” “,”|”),”€”,””),”.”,””)

…although further tweaks might be necessary, which I’ll leave to you.

Note that this ‘pre-cleaning’ could just as easily be performed within the main formula itself. I just thought it was simpler to keep this outside the main formula…

I don’t think I will either use or tweak an unnecessary overcomplicated approach to a simple task. ;)

snb: I wouldn’t use it either. Hence my “formualbation” comment. Let me be clear with my intent here: to see if I could do something for the sake of it. To quote a fellow New Zealander, I did it to see if I could “Knock the bastard off”. (Aside: That quote comes from Edmond Hillary, who climbed Everest “because it was there”.) To me, formula challenges are a bit like that: a pointless waste of time really, apart from the challenge.

That said, here’s where I feel duty bound to point out that at least my unnecessary overcomplicated approach actually handles this “simple task”. The original task was to split out numbers including decimals. The revised task of that last formula I posted was to extract things that Excel recognizes as numbers be they integers, decimals, dates, or times. Unfortunately your UDFs do neither.

I’ll be posting my own UDF in due course. In fact, the main reason for me doing this is to illustrate in my book that a UDF can often be a much better approach than a complicated formula. And a well-optimized UDF can often thrash the pants off a poorly optimized one.

That said, I live in the world of SharePoint Server, where VBA is not an option. So I have to rely on formulas to do things all the time that would otherwise be much better suited to a UDF.

You don’t need to convince me:

`Sub M_snb()`

c00 = "Jeff Weir Age: 43 DOB: 25/4/71 Salary: $100,000 StartTime: 8:30"

For j = 1 To Len(c00)

If InStr("0123456789/,:", Mid(c00, j, 1)) = 0 Then Mid(c00, j, 1) = " "

Next

sn = Split(Application.Trim(c00))

For j = 0 To UBound(sn)

If Val(sn(j)) = 0 Then sn(j) = ""

Next

MsgBox Join(Split(Application.Trim(Join(sn))), vbLf)

End Sub

`Isn't it time to emigrate to non Sharepoint country ?`

There’s not too many jobs around where I can work from home in my Pyjamas. This SharePoint gig is one of them.

Your sub is getting there, but doesn’t handle decimal points. While we’re at it, let’s throw in some scientific notation such as 1E3 or 1**3 just to give you a UDF Everest to climb. I’ll keep adding edge cases until a) we reach the summit or b) you tell me to climb it myself.

Here’s some more edge cases that my UDF handles: percents, times with am or pm in them, dates like 1 January 2010.

Nice !