String manipulation with worksheet formulas – Part 1

Deutsche Version / German version

A very common task is to extract a specific element (sub-string) from a longer string values. This first part shows some functions to extract the numeric part of a string while the next part will concentrate on getting the nth substring.

So let’s start with an assumption: The string to be processed does not contain 2 or more numeric parts but only one. e.g.

“This is part 123 of 234 parts”

is not allowed. Though

“This is part 123 of all parts”

is a valid string for the following functions.

First we try to evaluate the starting and ending positions of the numeric part. But before we can start using the formulas we have to define a name (Menu ‘Insert – Name – Define’) to make them more readable:

  • Name: seq
  • RefersTo Formula: =ROW(INDIRECT(“1:1024”))

We start with evaluating the beginning and ending position of our numeric part. For our example

“This is part 123 of all parts”

we should get 14 and 16 as result. Formulas to achieve this result are:

  1. Getting the starting position of the numeric part:
    • Array formula: =MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq))
    • Non-array formula: LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq)
  2. Getting the last position of the numeric part:
    • Array formula: =MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))
    • Non-array formula: =LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)

To extract the numeric part all we have to do is combine the above functions together with the MID function:

  • Array formula:
    =IF(MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),–MID(A1,MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))-MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq))+1),””)
  • Non-array formula:
    =IF(ISNUMBER(LOOKUP(2,1/MID(A1,seq,1),seq)),MID(A1,LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq),LOOKUP(2,1/MID(A1,seq,1),seq)-LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq)+1),””)

I have to admit these formulas do not look simple and of course you can achieve the same goal with some VBA code quite easily. But in case you can’t use VBA this provides you with a non-macro alternative.
If performance is an issue you may consider using the following array formula (nearly twice as fast than the other two formula combinations):

=–MID(A1,MATCH(TRUE,ISNUMBER(-MID(A1,seq,1)),0),ROWS(seq)-SUM(–ISERROR(-MID(A1,seq,1))))

The above formulas can be simplified if you can make some additional assumptions about your string value:

  1. Numeric part is always at the beginning of your string:
    • Array formula:
      =IF(MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),–LEFT(A1,MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))),””)
    • Non-array formula:
      =IF(ISNUMBER(LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)),–LEFT(A1,LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)),””)
    • Or a very nice non-array alternative (I think first posted by Aladin Akyurek):
      =–LEFT(A1,SUMPRODUCT((LEN(A1)-LEN(SUBSTITUTE(A1,{“.”,0,1,2,3,4,5,6,7,8,9},””)))))
  2. Numeric part is always at the end of your string:
    • Array formula:
      =IF(MIN(IF(ISNUMBER(–MID(A1,seq,1)),seq)),–MID(A1,MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),20),””)
    • Non-array formula:
      =–MID(A1, LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq),20)

This is only a compilation of some common used worksheet formulas for extracting numbers from strings. You may explore these functions in your own spreadsheet.
But keep in mind: For complicated string processing you may switch to VBA (together with using regular expressions).

Frank Kabel

Pages: 1 2

Posted in Uncategorized

7 thoughts on “String manipulation with worksheet formulas – Part 1

  1. Frank-

    Well…wrapping my mind around that one was educational!! The lookup version of the final extraction formula is missing a “–” before the MID function beginning the TRUE argument, a great way to convert numbers-as-text to real numbers. It’s there in the array form. I also didn’t know that IF treated any non-zero number as TRUE. Thanks and keep thetext manipulations coming!!!!

    Brett

  2. Hi Brett
    thanks for your comment (still working on the second part though).
    Could you clarify which formula is missing a minus (probably just too late on my side too see it…)

    Frank

  3. Hey Frank –

    I have a perfect application for your string splitter, if only I could understand it (and get it to work).

    I need to parse a concatenated name and address field. If I break the string at the first number, it is probably the end of the name and the beginning of the house number.

    Maybe you’ll have mercy and pass along a few tips.

    1) What does the “-” preceding the MID function do? Is this some sort of Excel supported autodecrement?

    2) What is the significance of the array and non-array versions? I hate to admit I’m completely lost, but since I am trying to parse a string, how did we get into arrays? Does your code use array processing to step thru the string, perhaps?

    Thanks – Tom

  4. Building on your example the following returns the text in B2 upto the last full stop (period). Great for truncating strings without splitting sentances!

    =LEFT(B2,FIND(MID(B2,FIND(“^^”,SUBSTITUTE(“.”&B2,”.”,”^^”,LEN(“.”&B2)-LEN(SUBSTITUTE(“.”&B2,”.”,””))))-1+1,4024),B2)-1)

    Thanks Frank

  5. How can create a expecific string of a cell? Example I’m working in a table with a column of 17 characters always. Right now I’m using a complex formula or a formating condicionating to have the result. But I heard we can format the cell to limit this to only 17.

    thanks

    pd. excuse my english

  6. How can we convert the string into array using VBA Code? for example: one string is there in Cell D7, D7=Shreeram. I want to split the characters using array. what is the respective VBA Code for it?

  7. Function StringToArray(sString As String) As Variant
       
        Dim aLetters() As String
        Dim i As Long
       
        ReDim aLetters(1 To Len(sString))
       
        For i = 1 To Len(sString)
            aLetters(i) = Mid(sString, i, 1)
        Next i
       
        StringToArray = aLetters
       
    End Function


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.