# String manipulation with worksheet formulas – Part 2

Deutsche Version / German version

Now continuing the first part of string manipulations with formulas the following will describe how to extract the nth element of a larger string. This technique is for example useful in the following cases:

• Extract first, middle and last name
• Get the last or first word of a sentence (words separated by spaces)

As an assumption for the following I’d assume that words in a larger string are separated by spaces. e.g. “This is a longer text of words separated by spaces”
So now let’s start extracting the individual elements of this string

1. Getting the first element
Getting the first word is rather simple. Just combine the functions LEFT and FIND (and add some error checking):

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

This formula searches for the first occurence of a space character and returns all characters to the right of this space. If no space character is found the complete string is returned. For our example this formula would return “This”.

2. Getting the last element
Now it’s getting a little bit more complicated as we don’t know in advance how many elements our string contains and Excel does not provide a FIND function which searches from the right (as you have in VBA with the method InStrRev). But is is still possible to achieve with formulas alone. We start with getting the position of the last delimiter (the last space character):

1. Using SUBSTITUTE and FIND:
=FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1
2. Using LOOKUP:
=LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1
where seq is again a defined name with the formula: =ROW(INDIRECT(“1:1024”))

Personally I prefer the second alternative as it is shorter (and for me easier to remember). But both formulas should return the same result. For our example theywill both return: 44
Now the only thing you have to do use this formula part within a MID formula (and don’t forget to add 1 to the position to skip the space):

1. Using SUBSTITUTE and FIND:
=MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1+1,1024)
2. Using LOOKUP:
=MID(A1,LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1+1,1024)

Both formulas would return the string: “spaces”

Getting the nth element
Adapting the approach of SUBSTITUTE and FIND for getting the last element you could also get the position for the nth and n+1 element within a string. Using this you get a kind of ‘monster’ formula:

=IF(OR(LEN(A1)=0,ISERROR(FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,\$B\$1)))),””, MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,\$B\$1)),IF(ISERROR(FIND(“^^”,SUBSTITUTE(
A1,” “,”^^”,\$B\$1))),1024,FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,\$B\$1)))-FIND(“^^”,SUBSTITUTE(
” “&A1,” “,”^^”,\$B\$1))))

where B1 stores the desired element number n.

And as stated in the first part of this article this could of course be achieved much easier using VBA but maybe it wouldn’t be that much fun …

Frank

Pages: 1 2

Posted in Uncategorized

## 47 thoughts on “String manipulation with worksheet formulas – Part 2”

1. Andrew says:

Hello Mr. Kabel,

Both formulas for extracting the last element seem to include the last space also. I think that removing the “-1? or using TRIM will prevent this. (Apologies if I am mistaken of course :-))

This seems to work also.

=RIGHT(A1,LEN(A1)-FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,LEN(A1)-LEN(SUBSTITUTE(A1,” “,””)))))

Anyway, I really appreciate your tips.

2. Frank Kabel says:

Hi Andrew
thanks for your comments and you’re right. Messed this up during the translation. I’ll correct the posting accordingly.

Frank

3. Jon Edmonds says:

For the 1st bit, would this be better?

=IF(LEN(A1),LEFT(A1,FIND(” “,A1&” “)-1),””)

Assuming TRIM is not needed.

4. Frank Kabel says:

Hi Jon
think your idea/solution is better (and at least shorter)

Frank

5. Patrick Rabau says:

=LEFT(A1,FIND(” “,A1&” “)-1)

is even shorter and seems to work also when A1 is empty.

6. Malcolm Morris says:

Hi,

I know this is off topic, but this page in your blog is the nearest to a discussion of my (excel) problem that I can find.

In Excel, worksheet functions, I need to extracting a substring from a longer string which I can handle, except for the case when the substring does not exist. In that case Ecxel returns #VALUE!. eg find ‘d’ in ‘abcefg’ fails

Is there any way to avoid/override that and put in a ‘not found’ or similar?

Any help would be appreciated.

Malcolm

7. Dianne Butterworth says:

Malcolm,

Try something like

Dianne

8. ross says:

use 2 nested if’s?

9. Glen says:

Hi,

I’m having a similar problem trying to extract the last part of a string from a users profile path that has been imported into excel from AD.

The path reads \serverprofilesfirstname.lastname

I need to extract the firstname.lastname. Can anyone help?

Thanks,

Glen.

10. Dianne Butterworth says:

Glen,

You could use a user-defined function like this (taking advantage of the InStrRev VBA function):

Function FullName(strToSearch As String) As String
If InStrRev(strToSearch, “”) > 0 Then
FullName = Mid(strToSearch, _
InStrRev(strToSearch, “”) + 1)
End If
End Function

Which raises a question for me that others might be able to answer.

When writing a function that I’ll be using on a worksheet as =MyFunction(A1), I normally would write the UDF to take a range argument:

Function MyFunction (rngMyArg as Range) as String

then convert rngMyArg.Value into a string to be handled.

Is there any advantage/disadvantage/difference in writing the UDF:

Function MyFunction (strMyArg as String) as String

as I’ve done in the above FullName function?

I tested it (in a limited way) and it seems to return the same string values.

Best practice?

11. floor says:

I have a large database (10 000 items)with address information. I would like to seperate a substring (country and institution) withinin a large string that does not have a uniform format. For example:

Department of Obstetrics and Gynaecology, Sahlgrenska University Hospital, Gothenburg, Sweden. elisabeth.jangsten@vgregion

or:

MRC Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK

or:

Could you think of a formula to:
1. create a seperate variable/string for the country (Sweden, UK)(maybe after first seperating the email adres, “@” ?)
2. make a seperate variable/string for the insstitutions “Sahlgrenska University Hospital”, “University of Glasgow”, “Halmstad University”. So whenever somewhere the string “university” is can excel seperate the whole string that is around “University” delimited by “, ”

I would be very gratefull with your help!

Thank you very much!!

12. Michael says:

floor –

Here’s a quick UDF that returns the country. It may show you the way for the University:

Function Country(Phrase As String) As String
Dim stuff() As String
stuff = Split(Phrase, “, “, -1, vbTextCompare)
Country = stuff(UBound(stuff))
If InStr(1, Country, “@”, vbTextCompare) Then
Country = Left(Country, InStr(1, Country, “.”, vbTextCompare) – 1)
End If
End Function

…mrt

13. Hans Schraven says:

You stated:

[Italic]=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

This formula searches for the first occurence of a space character and returns all characters to the [Bold]RIGHT[[/bold] of this space.[/italic]

I suppose you meant to the LEFT.

second remark:
instead of using formulas I’d prefer a UDF, using the split function in VBA, assuming words are being separated by spaces.
This creates an array with all the separate words.

sq=split([A1])
firstword=sq(0)
secondword=sq(1)
lastword=sq(ubound(sq))
14. floor says:

Michael!

Thank you for your help! I really have no experience with this and do not quite understand the formula BUT have been able to make a seperate column with the country in it using your UDF! It is not working for all cells because there is no uniform format I guess, still it is a great help! So thank you.

If you could help me with a UDF that seperates the University I would be very gratefull again. Sorry I really have no experience at all with this, so if you could help me I would appreciate it very much!

What I need is a UDF that whenever in a text string the word university appeares it gives the whole string around university delimited by commas. Ideally in addition if the word istitute appears it should also give me the string around institute delimited by comas.

For istance:

Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan. bushra.moiz@aku.edu

should return: The Aga Khan University Hospital

Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur 208016, India

in this case it should return:
Indian Institute of Technology

Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India

should again return: Aligarh Muslim University

Thanks,

Floor

15. Hans Schraven says:

A more universal UDF
```Function nthword(tekst As String, n As Integer, separator As String) sq = Split(tekst, IIf(separator = "", " ", separator)) nthword = sq(UBound(sq)) If n```

16. Hans Schraven says:
Function nthword(tekst As String, n As Integer, separator As String)
sq = Split(tekst, IIf(separator = “”, ” “, separator))
nthword = sq(UBound(sq))
If n &amp;#060 UBound(sq) + 1 Then nthword = sq(n – 1)
End Function

for instance:
tekst = “Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan”
n = 2
separator = “, “
nthword= “The Aga Khan University Hospital”

NB. the separator can be any character or group of characters

17. floor says:

Hans-

I copy pasted the UDF as you have written into VBA excel, and then wrote =nthword(A1). This does not work so I am probably doing something wrong? Sorry I am a real beginner in this.

18. Hans Schraven says:

I’d prefer (it’s necessary) for your example
=nthword(A1;2;”, “)

19. floor says:

Hans thank you very much, but.. the formula is still not working.. What am I doing wrong? Can anyone help?

20. Hans Schraven says:

read in my code
&#060 : that must be the sign for “greater-than”
unfortunately I don’t know the method tot put those characters on this board. The link, linked to ‘escape’ doesn’t inform me either.

21. floor says:

I think you are referring to &#060

I have changed that to > and then put in the formula in VB.

In excel sheet I wrote
=nthword(A1;2;”, “)

Still it is not recognizing the formula. Does it work at your side?

Thanks Hans, I really appreciate your helping me!

22. Hans Schraven says:

Probably you put the UDF in a module in another workbook.(personal.xls or something like that)
Sorry, I made a mistake it should be:
If n

23. Hans Schraven says:

I made the mistake that it should be the sign for ‘less-than’.
How did you manage tot put the ‘greater than’ sign in your post ?

24. floor says:

Hi Hans,

The formula is working now! It is not completely what though because it is always taking the second string delimited by commas, wheras I would like to have a function that searches for the word University and takes the string were the word university is in… (because for instance in my third example as you can see the University is in the third substring: Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India) This is my problem, there is no universal display in the address infromation, it is not always on the same place/ in the same order…

For the > or

25. floor says:

For the > or

26. floor says:

look at the left lower corner of your key bord. there is a key for it on my keyboard next to shift.. hope I can help you too!

27. Hans Schraven says:

if you only want the part with University in it:

Function wordfilter(tekst As String, filt As String, separator As String)
wordfilter = join(filter(Split(tekst, IIf(separator = “”, ” “, separator)),filt),“”)
End Function

=wordfilter(A1;”University”;”, “)

28. Michael says:

Hi Floor –

My UDF assumes that the country is found after the last comma-space combination, per your three examples, and if email address exists, it follows after a period-space combination after the country. As you saw, it won’t work in other cases. A different version is below. Here’s the UDF for the Institution. It should work as long as they are all universities. If not, Country2 should show you the way.

Function Institute(Phrase As String) As String
Dim stuff() As String
Dim i       As Integer
stuff = Split(Phrase, “, “, -1, vbTextCompare)
For i = LBound(stuff) To UBound(stuff)
If InStr(1, stuff(i), “University”, vbTextCompare) Then
Institute = stuff(i)
Exit For
Else
Institute = vbNullString
End If
Next i

End Function

Here’s a new UDF for the Country that will require you to add all possible countries per this example, but should catch the format anomalies:

Function Country2(Phrase As String) As String
Const NumCountries = 4   ‘Change to suit
Dim stuff() As String
Dim i       As Integer
Dim j       As Integer
Dim Countries(NumCountries) As String

Countries(0) = “UK”
Countries(1) = “Sweden”
Countries(2) = “France”
Countries(3) = “Germany”
Countries(4) = “Norway”   ‘Stay in synch with NumCountries, zero-based indexing

stuff = Split(Phrase, “, “, -1, vbTextCompare)

For j = 0 To NumCountries
For i = LBound(stuff) To UBound(stuff)
If InStr(1, stuff(i), Countries(j), vbTextCompare) Then
Country2 = stuff(i)
Exit For
Else
Country2 = vbNullString
End If
Next i
If Country2  vbNullString Then Exit For
Next j

If InStr(1, Country2, “@”, vbTextCompare) Then
Country2 = Left(Country2, InStr(1, Country2, “.”, vbTextCompare) – 1)
End If

If Country2 = vbNullString Then Country2 = “Not Recognized”

End Function

Good luck
Michael

29. Michael says:

The line after “Next i” should be

If Country2 “not equal” vbNullString etc.

30. floor says:

Thank you both Hans and Michael!
I am so glad you are willing to help me!

Michael, if I change If Country2 “not equal” vbNullString etc. I get a syntax error… If you have a suggestion, please let me know again.. thanks so much!

Floor

31. Michael says:

Hi Floor –

I’m sorry, by “not equal” I meant the “less than” sign followed by the “greater than” sign in VBA. I was too lazy to look up the escape characters, and hoped you’d catch my meaning. Here’s an attempt at what I meant.

If Country2 &#060 &#062 vbNullString Then Exit For

…Michael

32. floor says:

Michael, yes the formula works now..

I get a lot of not recognized. I tried to add more coutries by just typing for instance countries (5) = “India” but I guess I need to change more underneath that as well, because that is not working.
Also for some reason the formula already picks up strings like scotland, Holland, Department of health, because of it’s position?

Anyway, If you don’t feel like helping anymore I understand, you have already put a lot of effort in this.

I can always go with your first country UDF as that works better for me now…

If you do feel like it, could you let me know what I need to add in addition to all the countries that I want to add?

Thanks, and thanks again for all your help so far, it really is great because I was about to copy past everything manually..

Floor

33. floor says:

I have figured it out.. Added more countries.. so thanks.. no need anymore!

:)

34. Hans Schraven says:

Michael,

Your function ‘institute’ is a re-invention of the function ‘filter’ that is part of VBA.

35. mrt says:

Hans –

Thank you. I’ve rolled my own that way for so long, I never looked for a built in function that did the same thing.

Floor –

Now that you’ve got it ;-), here’s an improved version that would have made your life easier. Sorry to be so long getting back to you.

Function Country3(Phrase As String) As String
Dim Stuff() As String
Dim Countries As String
Dim Country() As String
Dim i       As Integer
Dim j       As Integer

Countries = “UK, Sweden, France, Germany, Norway”   ‘Add Countries separated by comma-space

Country = Split(Countries, “, “, -1, vbTextCompare)
Stuff = Split(Phrase, “, “, -1, vbTextCompare)

For j = LBound(Country) To UBound(Country)
For i = LBound(Stuff) To UBound(Stuff)
If InStr(1, Stuff(i), Country(j), vbTextCompare) Then
Country3 = Stuff(i)
Exit For
Else
Country3 = vbNullString
End If
Next i
If Country3 #&amp;060; #&amp;062; vbNullString Then Exit For
Next j

If InStr(1, Country3, “@”, vbTextCompare) Then
Country3 = Left(Country3, InStr(1, Country3, “.”, vbTextCompare) – 1)
End If

If Country3 = vbNullString Then Country3 = “Not Recognized”

End Function

And here’s a version of Institute() that handles Institutes ;-)

Function Institute(Phrase As String) As String
Dim Stuff() As String
Dim i       As Integer
Stuff = Split(Phrase, “, “, -1, vbTextCompare)
For i = LBound(Stuff) To UBound(Stuff)
If InStr(1, Stuff(i), “University”, vbTextCompare) Then
Institute = Stuff(i)
Exit For
ElseIf InStr(1, Stuff(i), “Institute”, vbTextCompare) Then
Institute = Stuff(i)
Exit For
Else
Institute = vbNullString
End If
Next i

End Function

…Michael

36. Hans Schraven says:

Michaal
to shrink and speed up your code:

Function Institute(Phrase As String) As String
If Instr(Phrase,“University”)&gt;0 Then
c0=“University”
ElseIf Instr(Phrase,“Institute”)&gt;0 Then
c0=“Institute”
End If
If c0 [#&amp;060; #&amp;062;] “” Then Institute=join(filter(split(phrase,“, “),c0),“”)
End Function
37. mrt says:

Hi Hans –

Thank you. I can see that it does by taking a larger view. Got to think about that last part. I’m learning something here ;-)

For the first part, no need to check if InStr() is greater than zero. InStr() returns zero if not found, giving False to the If. Life would be easier if formula FIND() or SEARCH() did that instead of returning errors.

…Michael

…Michael

38. Hans Schraven says:

Michael,

Your remarks led me to the one-liner below:

Function Institute(Phrase As String) As String
Institute=join(filter(split(phrase,“, “),IIf(InStr(phrase, “University”), “University”, IIf(InStr(phrase, “Institute”), “Institute”, “”))),“”)
End Function
39. mrt says:

Hi Hans –

Ooo…I like that. Now we’re approaching Obfuscated C ;-) Well, maybe not.

I didn’t get the need for Join() until I found that you can’t get at something like Filter(0)

Thanks
Michael (aka mrt)

40. Akhil says:

Hi,

i have two sheets

sheet 1 contains

column-1-list of names
column-2-count

sheet 2 contains

column-1-date
column-2-shift 1
column-2-shift 2
column-3-shift 3
column-4-shift 4

now there might be any number of names in each column in sheet 2
(i.e there will be many people in different shifts whose names are seperated by a delimiter ” , “)

i want a user defined sode which must perform the following operation.

if the name of an associate is found in sheet 2 in any shift for that particular date,then the count corresponding to his name should increase by one.

41. KRANTI says:

hi,
the very first formula is workable for my file but only change i want is there is dash(-) in some strings instead of space so can u modify this formula to read it space &/or dash? & revert?

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

42. KRANTI says:

adding to above i also need help regarding the following :
1.need to extract first 3 letters of 1st word & 2 letters of 2nd word of the same string if there is space betwenn 2 words
2.if there is dash (-) between two words instead of space then need to extract only 1st 5 letters of the same string

can anybody assist me on this?

43. Dick Kusleika says:

This formula will extract the first word defined by a space or a dash

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(“^^”,SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”))),LEFT(SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”),FIND(“^^”,SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”))-1),A1))
44. Dick Kusleika says:

With your text in A1, in B1

=SUBSTITUTE(SUBSTITUTE(A1,” “,”^^”),”-“,”^^”)

and in C1

=IF(ISNUMBER(FIND(“^^”,B1)),IF(MID(A1,FIND(“^^”,B1),1)=”-“,LEFT(LEFT(A1,FIND(“-“,A1)-1),5),LEFT(LEFT(A1,FIND(” “,A1)-1),3)&MID(A1,FIND(” “,A1)+1,2)),A1)

will extract the first five letters of the first word if the first and second words are separated by a hyphen and will extract the first three letters of the first word and first two letters of the second if they are separated by a space.

45. Rick Rothstein (MVP - Excel) says:

@Dick,

Your last setup will fail for something like this in A1…

One-Two Three

I believe this single formula will do work instead though…

=LEFT(SUBSTITUTE(IF(FIND(“-“,A1&”-“)<find (” “,A1&” “),SUBSTITUTE(A1,”-“,””),REPLACE(A1,4,FIND(” “,A1)-3,””)),” “,””),5)
46. Dick Kusleika says:

Rick: Thanks. I assumed they would only want up to the first five letters. Possibly presumptuous, I admit.

Posting code? Use <pre> tags for VBA and <code> tags for inline.