String manipulation with worksheet formulas – Part 2

Deutsche Version / German version

Now continuing the first part of string manipulations with formulas the following will describe how to extract the nth element of a larger string. This technique is for example useful in the following cases:

  • Extract first, middle and last name
  • Get the last or first word of a sentence (words separated by spaces)

As an assumption for the following I’d assume that words in a larger string are separated by spaces. e.g. “This is a longer text of words separated by spaces”
So now let’s start extracting the individual elements of this string

1. Getting the first element
Getting the first word is rather simple. Just combine the functions LEFT and FIND (and add some error checking):

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

This formula searches for the first occurence of a space character and returns all characters to the right of this space. If no space character is found the complete string is returned. For our example this formula would return “This”.

2. Getting the last element
Now it’s getting a little bit more complicated as we don’t know in advance how many elements our string contains and Excel does not provide a FIND function which searches from the right (as you have in VBA with the method InStrRev). But is is still possible to achieve with formulas alone. We start with getting the position of the last delimiter (the last space character):

  1. Using SUBSTITUTE and FIND:
    =FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1
  2. Using LOOKUP:
    =LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1
    where seq is again a defined name with the formula: =ROW(INDIRECT(“1:1024”))

Personally I prefer the second alternative as it is shorter (and for me easier to remember). But both formulas should return the same result. For our example theywill both return: 44
Now the only thing you have to do use this formula part within a MID formula (and don’t forget to add 1 to the position to skip the space):

  1. Using SUBSTITUTE and FIND:
    =MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1+1,1024)
  2. Using LOOKUP:
    =MID(A1,LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1+1,1024)

Both formulas would return the string: “spaces”

Getting the nth element
Adapting the approach of SUBSTITUTE and FIND for getting the last element you could also get the position for the nth and n+1 element within a string. Using this you get a kind of ‘monster’ formula:

=IF(OR(LEN(A1)=0,ISERROR(FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,$B$1)))),””, MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,$B$1)),IF(ISERROR(FIND(“^^”,SUBSTITUTE(
A1,” “,”^^”,$B$1))),1024,FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,$B$1)))-FIND(“^^”,SUBSTITUTE(
” “&A1,” “,”^^”,$B$1))))

where B1 stores the desired element number n.

And as stated in the first part of this article this could of course be achieved much easier using VBA but maybe it wouldn’t be that much fun …

Frank

Pages: 1 2

Posted in Uncategorized

47 thoughts on “String manipulation with worksheet formulas – Part 2

  1. Hello Mr. Kabel,

    Both formulas for extracting the last element seem to include the last space also. I think that removing the “-1? or using TRIM will prevent this. (Apologies if I am mistaken of course :-))

    This seems to work also.

    =RIGHT(A1,LEN(A1)-FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,LEN(A1)-LEN(SUBSTITUTE(A1,” “,””)))))

    Anyway, I really appreciate your tips.

  2. Hi Andrew
    thanks for your comments and you’re right. Messed this up during the translation. I’ll correct the posting accordingly.

    Frank

  3. For the 1st bit, would this be better?

    =IF(LEN(A1),LEFT(A1,FIND(” “,A1&” “)-1),””)

    Assuming TRIM is not needed.

  4. =LEFT(A1,FIND(” “,A1&” “)-1)

    is even shorter and seems to work also when A1 is empty.

  5. Hi,

    I know this is off topic, but this page in your blog is the nearest to a discussion of my (excel) problem that I can find.

    In Excel, worksheet functions, I need to extracting a substring from a longer string which I can handle, except for the case when the substring does not exist. In that case Ecxel returns #VALUE!. eg find ‘d’ in ‘abcefg’ fails

    Is there any way to avoid/override that and put in a ‘not found’ or similar?

    Any help would be appreciated.

    Malcolm

  6. Malcolm,

    Try something like

    =IF(ISERROR(FIND(“d”,”abcefg”)),”Not Found”,FIND(“d”,”abcefg”))

    Dianne

  7. Hi,

    I’m having a similar problem trying to extract the last part of a string from a users profile path that has been imported into excel from AD.

    The path reads \serverprofilesfirstname.lastname

    I need to extract the firstname.lastname. Can anyone help?

    Thanks,

    Glen.

  8. Glen,

    You could use a user-defined function like this (taking advantage of the InStrRev VBA function):

    Function FullName(strToSearch As String) As String
    If InStrRev(strToSearch, “”) > 0 Then
    FullName = Mid(strToSearch, _
    InStrRev(strToSearch, “”) + 1)
    End If
    End Function

    Which raises a question for me that others might be able to answer.

    When writing a function that I’ll be using on a worksheet as =MyFunction(A1), I normally would write the UDF to take a range argument:

    Function MyFunction (rngMyArg as Range) as String

    then convert rngMyArg.Value into a string to be handled.

    Is there any advantage/disadvantage/difference in writing the UDF:

    Function MyFunction (strMyArg as String) as String

    as I’ve done in the above FullName function?

    I tested it (in a limited way) and it seems to return the same string values.

    Best practice?

  9. I have a large database (10 000 items)with address information. I would like to seperate a substring (country and institution) withinin a large string that does not have a uniform format. For example:

    Department of Obstetrics and Gynaecology, Sahlgrenska University Hospital, Gothenburg, Sweden. elisabeth.jangsten@vgregion

    or:

    MRC Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK

    or:

    Halmstad University, Halmstad, Sweden. karen.odberg-pettersson@phs.ki.se

    Could you think of a formula to:
    1. create a seperate variable/string for the country (Sweden, UK)(maybe after first seperating the email adres, “@” ?)
    2. make a seperate variable/string for the insstitutions “Sahlgrenska University Hospital”, “University of Glasgow”, “Halmstad University”. So whenever somewhere the string “university” is can excel seperate the whole string that is around “University” delimited by “, ”

    I would be very gratefull with your help!

    Thank you very much!!

  10. floor –

    Here’s a quick UDF that returns the country. It may show you the way for the University:

    Function Country(Phrase As String) As String
       Dim stuff() As String
       stuff = Split(Phrase, “, “, -1, vbTextCompare)
       Country = stuff(UBound(stuff))
       If InStr(1, Country, “@”, vbTextCompare) Then
          Country = Left(Country, InStr(1, Country, “.”, vbTextCompare) – 1)
       End If
    End Function

    …mrt

  11. You stated:

    [Italic]=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

    This formula searches for the first occurence of a space character and returns all characters to the [Bold]RIGHT[[/bold] of this space.[/italic]

    I suppose you meant to the LEFT.

    second remark:
    instead of using formulas I’d prefer a UDF, using the split function in VBA, assuming words are being separated by spaces.
    This creates an array with all the separate words.

    sq=split([A1])
    firstword=sq(0)
    secondword=sq(1)
    lastword=sq(ubound(sq))
  12. Michael!

    Thank you for your help! I really have no experience with this and do not quite understand the formula BUT have been able to make a seperate column with the country in it using your UDF! It is not working for all cells because there is no uniform format I guess, still it is a great help! So thank you.

    If you could help me with a UDF that seperates the University I would be very gratefull again. Sorry I really have no experience at all with this, so if you could help me I would appreciate it very much!

    What I need is a UDF that whenever in a text string the word university appeares it gives the whole string around university delimited by commas. Ideally in addition if the word istitute appears it should also give me the string around institute delimited by comas.

    For istance:

    Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan. bushra.moiz@aku.edu

    should return: The Aga Khan University Hospital

    Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur 208016, India

    in this case it should return:
    Indian Institute of Technology

    Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India

    should again return: Aligarh Muslim University

    Thanks,

    Floor

  13. A more universal UDF
    Function nthword(tekst As String, n As Integer, separator As String)
    sq = Split(tekst, IIf(separator = "", " ", separator))
    nthword = sq(UBound(sq))
    If n

  14. Function nthword(tekst As String, n As Integer, separator As String)
      sq = Split(tekst, IIf(separator = “”, ” “, separator))
      nthword = sq(UBound(sq))
      If n &#060 UBound(sq) + 1 Then nthword = sq(n – 1)
    End Function

    for instance:
    tekst = “Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan”
    n = 2
    separator = “, “
    nthword= “The Aga Khan University Hospital”

    NB. the separator can be any character or group of characters

  15. Hans-

    Thank you for your answer!

    I copy pasted the UDF as you have written into VBA excel, and then wrote =nthword(A1). This does not work so I am probably doing something wrong? Sorry I am a real beginner in this.

  16. I’d prefer (it’s necessary) for your example
    =nthword(A1;2;”, “)

  17. Hans thank you very much, but.. the formula is still not working.. What am I doing wrong? Can anyone help?

  18. read in my code
    &#060 : that must be the sign for “greater-than”
    unfortunately I don’t know the method tot put those characters on this board. The link, linked to ‘escape’ doesn’t inform me either.

  19. I think you are referring to &#060

    I have changed that to > and then put in the formula in VB.

    In excel sheet I wrote
    =nthword(A1;2;”, “)

    Still it is not recognizing the formula. Does it work at your side?

    Thanks Hans, I really appreciate your helping me!

  20. Excel/Menu Bar/Insert/Function/Userdefined.
    Probably you put the UDF in a module in another workbook.(personal.xls or something like that)
    Sorry, I made a mistake it should be:
    If n

  21. I made the mistake that it should be the sign for ‘less-than’.
    How did you manage tot put the ‘greater than’ sign in your post ?

  22. Hi Hans,

    The formula is working now! It is not completely what though because it is always taking the second string delimited by commas, wheras I would like to have a function that searches for the word University and takes the string were the word university is in… (because for instance in my third example as you can see the University is in the third substring: Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India) This is my problem, there is no universal display in the address infromation, it is not always on the same place/ in the same order…

    For the > or

  23. look at the left lower corner of your key bord. there is a key for it on my keyboard next to shift.. hope I can help you too!

  24. if you only want the part with University in it:

    Function wordfilter(tekst As String, filt As String, separator As String)
      wordfilter = join(filter(Split(tekst, IIf(separator = “”, ” “, separator)),filt),“”)
    End Function

    =wordfilter(A1;”University”;”, “)

  25. Hi Floor –

    My UDF assumes that the country is found after the last comma-space combination, per your three examples, and if email address exists, it follows after a period-space combination after the country. As you saw, it won’t work in other cases. A different version is below. Here’s the UDF for the Institution. It should work as long as they are all universities. If not, Country2 should show you the way.

    Function Institute(Phrase As String) As String
       Dim stuff() As String
       Dim i       As Integer
       stuff = Split(Phrase, “, “, -1, vbTextCompare)
       For i = LBound(stuff) To UBound(stuff)
          If InStr(1, stuff(i), “University”, vbTextCompare) Then
             Institute = stuff(i)
             Exit For
          Else
             Institute = vbNullString
          End If
       Next i

    End Function

    Here’s a new UDF for the Country that will require you to add all possible countries per this example, but should catch the format anomalies:

    Function Country2(Phrase As String) As String
       Const NumCountries = 4   ‘Change to suit
      Dim stuff() As String
       Dim i       As Integer
       Dim j       As Integer
       Dim Countries(NumCountries) As String

       Countries(0) = “UK”
       Countries(1) = “Sweden”
       Countries(2) = “France”
       Countries(3) = “Germany”
       Countries(4) = “Norway”   ‘Stay in synch with NumCountries, zero-based indexing

       stuff = Split(Phrase, “, “, -1, vbTextCompare)

       For j = 0 To NumCountries
          For i = LBound(stuff) To UBound(stuff)
             If InStr(1, stuff(i), Countries(j), vbTextCompare) Then
                Country2 = stuff(i)
                Exit For
             Else
                Country2 = vbNullString
             End If
          Next i
          If Country2  vbNullString Then Exit For
       Next j

       If InStr(1, Country2, “@”, vbTextCompare) Then
          Country2 = Left(Country2, InStr(1, Country2, “.”, vbTextCompare) – 1)
       End If

       If Country2 = vbNullString Then Country2 = “Not Recognized”

    End Function

    Good luck
    Michael

  26. The line after “Next i” should be

    If Country2 “not equal” vbNullString etc.

  27. Thank you both Hans and Michael!
    I am so glad you are willing to help me!

    Michael, if I change If Country2 “not equal” vbNullString etc. I get a syntax error… If you have a suggestion, please let me know again.. thanks so much!

    Floor

  28. Hi Floor –

    I’m sorry, by “not equal” I meant the “less than” sign followed by the “greater than” sign in VBA. I was too lazy to look up the escape characters, and hoped you’d catch my meaning. Here’s an attempt at what I meant.

    If Country2 &#060 &#062 vbNullString Then Exit For

    …Michael

  29. Michael, yes the formula works now..

    I get a lot of not recognized. I tried to add more coutries by just typing for instance countries (5) = “India” but I guess I need to change more underneath that as well, because that is not working.
    Also for some reason the formula already picks up strings like scotland, Holland, Department of health, because of it’s position?

    Anyway, If you don’t feel like helping anymore I understand, you have already put a lot of effort in this.

    I can always go with your first country UDF as that works better for me now…

    If you do feel like it, could you let me know what I need to add in addition to all the countries that I want to add?

    Thanks, and thanks again for all your help so far, it really is great because I was about to copy past everything manually..

    Floor

  30. Michael,

    Your function ‘institute’ is a re-invention of the function ‘filter’ that is part of VBA.

  31. Hans –

    Thank you. I’ve rolled my own that way for so long, I never looked for a built in function that did the same thing.

    Floor –

    Now that you’ve got it ;-), here’s an improved version that would have made your life easier. Sorry to be so long getting back to you.

    Function Country3(Phrase As String) As String
       Dim Stuff() As String
       Dim Countries As String
       Dim Country() As String
       Dim i       As Integer
       Dim j       As Integer

       Countries = “UK, Sweden, France, Germany, Norway”   ‘Add Countries separated by comma-space

       Country = Split(Countries, “, “, -1, vbTextCompare)
       Stuff = Split(Phrase, “, “, -1, vbTextCompare)

       For j = LBound(Country) To UBound(Country)
          For i = LBound(Stuff) To UBound(Stuff)
             If InStr(1, Stuff(i), Country(j), vbTextCompare) Then
                Country3 = Stuff(i)
                Exit For
             Else
                Country3 = vbNullString
             End If
          Next i
          If Country3 #&060; #&062; vbNullString Then Exit For
       Next j

       If InStr(1, Country3, “@”, vbTextCompare) Then
          Country3 = Left(Country3, InStr(1, Country3, “.”, vbTextCompare) – 1)
       End If

       If Country3 = vbNullString Then Country3 = “Not Recognized”

    End Function

    And here’s a version of Institute() that handles Institutes ;-)

    Function Institute(Phrase As String) As String
       Dim Stuff() As String
       Dim i       As Integer
       Stuff = Split(Phrase, “, “, -1, vbTextCompare)
       For i = LBound(Stuff) To UBound(Stuff)
          If InStr(1, Stuff(i), “University”, vbTextCompare) Then
             Institute = Stuff(i)
             Exit For
          ElseIf InStr(1, Stuff(i), “Institute”, vbTextCompare) Then
             Institute = Stuff(i)
             Exit For
          Else
             Institute = vbNullString
          End If
       Next i

    End Function

    Glad we’re there.
    …Michael

  32. Michaal
    to shrink and speed up your code:

    Function Institute(Phrase As String) As String
      If Instr(Phrase,“University”)>0 Then
        c0=“University”
      ElseIf Instr(Phrase,“Institute”)>0 Then
        c0=“Institute”
      End If
      If c0 [#&060; #&062;] “” Then Institute=join(filter(split(phrase,“, “),c0),“”)
    End Function
  33. Hi Hans –

    Thank you. I can see that it does by taking a larger view. Got to think about that last part. I’m learning something here ;-)

    For the first part, no need to check if InStr() is greater than zero. InStr() returns zero if not found, giving False to the If. Life would be easier if formula FIND() or SEARCH() did that instead of returning errors.

    …Michael

    …Michael

  34. Michael,

    Your remarks led me to the one-liner below:

    Function Institute(Phrase As String) As String
      Institute=join(filter(split(phrase,“, “),IIf(InStr(phrase, “University”), “University”, IIf(InStr(phrase, “Institute”), “Institute”, “”))),“”)
    End Function
  35. Hi Hans –

    Ooo…I like that. Now we’re approaching Obfuscated C ;-) Well, maybe not.

    I didn’t get the need for Join() until I found that you can’t get at something like Filter(0)

    Thanks
    Michael (aka mrt)

  36. Hi,

    i have two sheets

    sheet 1 contains

    column-1-list of names
    column-2-count

    sheet 2 contains

    column-1-date
    column-2-shift 1
    column-2-shift 2
    column-3-shift 3
    column-4-shift 4

    now there might be any number of names in each column in sheet 2
    (i.e there will be many people in different shifts whose names are seperated by a delimiter ” , “)

    i want a user defined sode which must perform the following operation.

    if the name of an associate is found in sheet 2 in any shift for that particular date,then the count corresponding to his name should increase by one.

    Thanks in advance

  37. hi,
    the very first formula is workable for my file but only change i want is there is dash(-) in some strings instead of space so can u modify this formula to read it space &/or dash? & revert?

    =IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

  38. adding to above i also need help regarding the following :
    1.need to extract first 3 letters of 1st word & 2 letters of 2nd word of the same string if there is space betwenn 2 words
    2.if there is dash (-) between two words instead of space then need to extract only 1st 5 letters of the same string

    can anybody assist me on this?

  39. This formula will extract the first word defined by a space or a dash

    =IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(“^^”,SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”))),LEFT(SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”),FIND(“^^”,SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”))-1),A1))
  40. With your text in A1, in B1

    =SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”)

    and in C1

    =IF(ISNUMBER(FIND(“^^”,B1)),IF(MID(A1,FIND(“^^”,B1),1)=”-“,LEFT(LEFT(A1,FIND(“-“,A1)-1),5),LEFT(LEFT(A1,FIND(” “,A1)-1),3)&MID(A1,FIND(” “,A1)+1,2)),A1)

    will extract the first five letters of the first word if the first and second words are separated by a hyphen and will extract the first three letters of the first word and first two letters of the second if they are separated by a space.

  41. @Dick,

    Your last setup will fail for something like this in A1…

    One-Two Three

    I believe this single formula will do work instead though…

    =LEFT(SUBSTITUTE(IF(FIND(“-“,A1&”-“)<find (” “,A1&” “),SUBSTITUTE(A1,”-“,””),REPLACE(A1,4,FIND(” “,A1)-3,””)),” “,””),5)


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.