Deutsche Version / German version

Now continuing the first part of string manipulations with formulas the following will describe how to extract the n^{th} element of a larger string. This technique is for example useful in the following cases:

- Extract first, middle and last name
- Get the last or first word of a sentence (words separated by spaces)

As an assumption for the following I’d assume that words in a larger string are separated by spaces. e.g. “This is a longer text of words separated by spaces”

So now let’s start extracting the individual elements of this string

**1. Getting the first element**

Getting the first word is rather simple. Just combine the functions LEFT and FIND (and add some error checking):

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

This formula searches for the first occurence of a space character and returns all characters to the right of this space. If no space character is found the complete string is returned. For our example this formula would return “This”.

**2. Getting the last element**

Now it’s getting a little bit more complicated as we don’t know in advance how many elements our string contains and Excel does not provide a FIND function which searches from the right (as you have in VBA with the method InStrRev). But is is still possible to achieve with formulas alone. We start with getting the position of the last delimiter (the last space character):

- Using SUBSTITUTE and FIND:

=FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1 - Using LOOKUP:

=LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1

where seq is again a defined name with the formula: =ROW(INDIRECT(“1:1024”))

Personally I prefer the second alternative as it is shorter (and for me easier to remember). But both formulas should return the same result. For our example theywill both return: 44

Now the only thing you have to do use this formula part within a MID formula (and don’t forget to add 1 to the position to skip the space):

- Using SUBSTITUTE and FIND:

=MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,LEN(” “&A1)-LEN(SUBSTITUTE(” “&A1,” “,””))))-1+1,1024) - Using LOOKUP:

=MID(A1,LOOKUP(2,1/(MID(” “&A1,seq,1)=” “),seq)-1+1,1024)

Both formulas would return the string: “spaces”

**Getting the n ^{th} element**

Adapting the approach of SUBSTITUTE and FIND for getting the last element you could also get the position for the n

^{th}and n+1 element within a string. Using this you get a kind of ‘monster’ formula:

=IF(OR(LEN(A1)=0,ISERROR(FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,$B$1)))),””, MID(A1,FIND(“^^”,SUBSTITUTE(” “&A1,” “,”^^”,$B$1)),IF(ISERROR(FIND(“^^”,SUBSTITUTE(

A1,” “,”^^”,$B$1))),1024,FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,$B$1)))-FIND(“^^”,SUBSTITUTE(

” “&A1,” “,”^^”,$B$1))))

where B1 stores the desired element number n.

And as stated in the first part of this article this could of course be achieved much easier using VBA but maybe it wouldn’t be that much fun …

Frank

**Pages:** 1 2

Hello Mr. Kabel,

Both formulas for extracting the last element seem to include the last space also. I think that removing the “-1? or using TRIM will prevent this. (Apologies if I am mistaken of course :-))

This seems to work also.

=RIGHT(A1,LEN(A1)-FIND(“^^”,SUBSTITUTE(A1,” “,”^^”,LEN(A1)-LEN(SUBSTITUTE(A1,” “,””)))))

Anyway, I really appreciate your tips.

Hi Andrew

thanks for your comments and you’re right. Messed this up during the translation. I’ll correct the posting accordingly.

Frank

For the 1st bit, would this be better?

=IF(LEN(A1),LEFT(A1,FIND(” “,A1&” “)-1),””)

Assuming TRIM is not needed.

Hi Jon

think your idea/solution is better (and at least shorter)

Frank

=LEFT(A1,FIND(” “,A1&” “)-1)

is even shorter and seems to work also when A1 is empty.

Hi,

I know this is off topic, but this page in your blog is the nearest to a discussion of my (excel) problem that I can find.

In Excel, worksheet functions, I need to extracting a substring from a longer string which I can handle, except for the case when the substring does not exist. In that case Ecxel returns #VALUE!. eg find ‘d’ in ‘abcefg’ fails

Is there any way to avoid/override that and put in a ‘not found’ or similar?

Any help would be appreciated.

Malcolm

Malcolm,

Try something like

=IF(ISERROR(FIND(“d”,”abcefg”)),”Not Found”,FIND(“d”,”abcefg”))

Dianne

use 2 nested if’s?

Hi,

I’m having a similar problem trying to extract the last part of a string from a users profile path that has been imported into excel from AD.

The path reads \serverprofilesfirstname.lastname

I need to extract the firstname.lastname. Can anyone help?

Thanks,

Glen.

Glen,

You could use a user-defined function like this (taking advantage of the InStrRev VBA function):

Function FullName(strToSearch As String) As String

If InStrRev(strToSearch, “”) > 0 Then

FullName = Mid(strToSearch, _

InStrRev(strToSearch, “”) + 1)

End If

End Function

Which raises a question for me that others might be able to answer.

When writing a function that I’ll be using on a worksheet as =MyFunction(A1), I normally would write the UDF to take a range argument:

Function MyFunction (rngMyArg as Range) as String

then convert rngMyArg.Value into a string to be handled.

Is there any advantage/disadvantage/difference in writing the UDF:

Function MyFunction (strMyArg as String) as String

as I’ve done in the above FullName function?

I tested it (in a limited way) and it seems to return the same string values.

Best practice?

I have a large database (10 000 items)with address information. I would like to seperate a substring (country and institution) withinin a large string that does not have a uniform format. For example:

Department of Obstetrics and Gynaecology, Sahlgrenska University Hospital, Gothenburg, Sweden. elisabeth.jangsten@vgregion

or:

MRC Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK

or:

Halmstad University, Halmstad, Sweden. karen.odberg-pettersson@phs.ki.se

Could you think of a formula to:

1. create a seperate variable/string for the country (Sweden, UK)(maybe after first seperating the email adres, “@” ?)

2. make a seperate variable/string for the insstitutions “Sahlgrenska University Hospital”, “University of Glasgow”, “Halmstad University”. So whenever somewhere the string “university” is can excel seperate the whole string that is around “University” delimited by “, ”

I would be very gratefull with your help!

Thank you very much!!

floor –

Here’s a quick UDF that returns the country. It may show you the way for the University:

Dim stuff() As String

stuff = Split(Phrase, “, “, -1, vbTextCompare)

Country = stuff(UBound(stuff))

If InStr(1, Country, “@”, vbTextCompare) Then

Country = Left(Country, InStr(1, Country, “.”, vbTextCompare) – 1)

End If

End Function

…mrt

You stated:

[Italic]=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

This formula searches for the first occurence of a space character and returns all characters to the [Bold]RIGHT[[/bold] of this space.[/italic]

I suppose you meant to the LEFT.

second remark:

instead of using formulas I’d prefer a UDF, using the split function in VBA, assuming words are being separated by spaces.

This creates an array with all the separate words.

firstword=sq(0)

secondword=sq(1)

lastword=sq(ubound(sq))

Michael!

Thank you for your help! I really have no experience with this and do not quite understand the formula BUT have been able to make a seperate column with the country in it using your UDF! It is not working for all cells because there is no uniform format I guess, still it is a great help! So thank you.

If you could help me with a UDF that seperates the University I would be very gratefull again. Sorry I really have no experience at all with this, so if you could help me I would appreciate it very much!

What I need is a UDF that whenever in a text string the word university appeares it gives the whole string around university delimited by commas. Ideally in addition if the word istitute appears it should also give me the string around institute delimited by comas.

For istance:

Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan. bushra.moiz@aku.edu

should return: The Aga Khan University Hospital

Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur 208016, India

in this case it should return:

Indian Institute of Technology

Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India

should again return: Aligarh Muslim University

Thanks,

Floor

A more universal UDF

`Function nthword(tekst As String, n As Integer, separator As String)`

sq = Split(tekst, IIf(separator = "", " ", separator))

nthword = sq(UBound(sq))

If n

sq = Split(tekst, IIf(separator = “”, ” “, separator))

nthword = sq(UBound(sq))

If n &#060 UBound(sq) + 1 Then nthword = sq(n – 1)

End Function

for instance:

tekst = “Department of Pathology and Microbiology, The Aga Khan University Hospital, Karachi, Pakistan”

n = 2

separator = “, “

nthword= “The Aga Khan University Hospital”

NB. the separator can be any character or group of characters

Hans-

Thank you for your answer!

I copy pasted the UDF as you have written into VBA excel, and then wrote =nthword(A1). This does not work so I am probably doing something wrong? Sorry I am a real beginner in this.

I’d prefer (it’s necessary) for your example

=nthword(A1;2;”, “)

Hans thank you very much, but.. the formula is still not working.. What am I doing wrong? Can anyone help?

read in my code

< : that must be the sign for “greater-than”

unfortunately I don’t know the method tot put those characters on this board. The link, linked to ‘escape’ doesn’t inform me either.

I think you are referring to <

I have changed that to > and then put in the formula in VB.

In excel sheet I wrote

=nthword(A1;2;”, “)

Still it is not recognizing the formula. Does it work at your side?

Thanks Hans, I really appreciate your helping me!

Excel/Menu Bar/Insert/Function/Userdefined.

Probably you put the UDF in a module in another workbook.(personal.xls or something like that)

Sorry, I made a mistake it should be:

If n

I made the mistake that it should be the sign for ‘less-than’.

How did you manage tot put the ‘greater than’ sign in your post ?

Hi Hans,

The formula is working now! It is not completely what though because it is always taking the second string delimited by commas, wheras I would like to have a function that searches for the word University and takes the string were the word university is in… (because for instance in my third example as you can see the University is in the third substring: Section of Genetics, Department of Zoology, Aligarh Muslim University, Uttar Pradesh, India) This is my problem, there is no universal display in the address infromation, it is not always on the same place/ in the same order…

For the > or

For the > or

look at the left lower corner of your key bord. there is a key for it on my keyboard next to shift.. hope I can help you too!

if you only want the part with University in it:

wordfilter = join(filter(Split(tekst, IIf(separator = “”, ” “, separator)),filt),“”)

End Function

=wordfilter(A1;”University”;”, “)

Hi Floor –

My UDF assumes that the country is found after the last comma-space combination, per your three examples, and if email address exists, it follows after a period-space combination after the country. As you saw, it won’t work in other cases. A different version is below. Here’s the UDF for the Institution. It should work as long as they are all universities. If not, Country2 should show you the way.

Dim stuff() As String

Dim i As Integer

stuff = Split(Phrase, “, “, -1, vbTextCompare)

For i = LBound(stuff) To UBound(stuff)

If InStr(1, stuff(i), “University”, vbTextCompare) Then

Institute = stuff(i)

Exit For

Else

Institute = vbNullString

End If

Next i

End Function

Here’s a new UDF for the Country that will require you to add all possible countries per this example, but should catch the format anomalies:

Const NumCountries = 4 ‘Change to suit

Dim stuff() As String

Dim i As Integer

Dim j As Integer

Dim Countries(NumCountries) As String

Countries(0) = “UK”

Countries(1) = “Sweden”

Countries(2) = “France”

Countries(3) = “Germany”

Countries(4) = “Norway” ‘Stay in synch with NumCountries, zero-based indexing

stuff = Split(Phrase, “, “, -1, vbTextCompare)

For j = 0 To NumCountries

For i = LBound(stuff) To UBound(stuff)

If InStr(1, stuff(i), Countries(j), vbTextCompare) Then

Country2 = stuff(i)

Exit For

Else

Country2 = vbNullString

End If

Next i

If Country2 vbNullString Then Exit For

Next j

If InStr(1, Country2, “@”, vbTextCompare) Then

Country2 = Left(Country2, InStr(1, Country2, “.”, vbTextCompare) – 1)

End If

If Country2 = vbNullString Then Country2 = “Not Recognized”

End Function

Good luck

Michael

The line after “Next i” should be

If Country2 “not equal” vbNullString etc.

Thank you both Hans and Michael!

I am so glad you are willing to help me!

Michael, if I change If Country2 “not equal” vbNullString etc. I get a syntax error… If you have a suggestion, please let me know again.. thanks so much!

Floor

Hi Floor –

I’m sorry, by “not equal” I meant the “less than” sign followed by the “greater than” sign in VBA. I was too lazy to look up the escape characters, and hoped you’d catch my meaning. Here’s an attempt at what I meant.

If Country2 < > vbNullString Then Exit For

…Michael

Michael, yes the formula works now..

I get a lot of not recognized. I tried to add more coutries by just typing for instance countries (5) = “India” but I guess I need to change more underneath that as well, because that is not working.

Also for some reason the formula already picks up strings like scotland, Holland, Department of health, because of it’s position?

Anyway, If you don’t feel like helping anymore I understand, you have already put a lot of effort in this.

I can always go with your first country UDF as that works better for me now…

If you do feel like it, could you let me know what I need to add in addition to all the countries that I want to add?

Thanks, and thanks again for all your help so far, it really is great because I was about to copy past everything manually..

Floor

I have figured it out.. Added more countries.. so thanks.. no need anymore!

:)

Michael,

Your function ‘institute’ is a re-invention of the function ‘filter’ that is part of VBA.

Hans –

Thank you. I’ve rolled my own that way for so long, I never looked for a built in function that did the same thing.

Floor –

Now that you’ve got it ;-), here’s an improved version that would have made your life easier. Sorry to be so long getting back to you.

Dim Stuff() As String

Dim Countries As String

Dim Country() As String

Dim i As Integer

Dim j As Integer

Countries = “UK, Sweden, France, Germany, Norway” ‘Add Countries separated by comma-space

Country = Split(Countries, “, “, -1, vbTextCompare)

Stuff = Split(Phrase, “, “, -1, vbTextCompare)

For j = LBound(Country) To UBound(Country)

For i = LBound(Stuff) To UBound(Stuff)

If InStr(1, Stuff(i), Country(j), vbTextCompare) Then

Country3 = Stuff(i)

Exit For

Else

Country3 = vbNullString

End If

Next i

If Country3 #&060; #&062; vbNullString Then Exit For

Next j

If InStr(1, Country3, “@”, vbTextCompare) Then

Country3 = Left(Country3, InStr(1, Country3, “.”, vbTextCompare) – 1)

End If

If Country3 = vbNullString Then Country3 = “Not Recognized”

End Function

And here’s a version of Institute() that handles Institutes ;-)

Dim Stuff() As String

Dim i As Integer

Stuff = Split(Phrase, “, “, -1, vbTextCompare)

For i = LBound(Stuff) To UBound(Stuff)

If InStr(1, Stuff(i), “University”, vbTextCompare) Then

Institute = Stuff(i)

Exit For

ElseIf InStr(1, Stuff(i), “Institute”, vbTextCompare) Then

Institute = Stuff(i)

Exit For

Else

Institute = vbNullString

End If

Next i

End Function

Glad we’re there.

…Michael

Michaal

to shrink and speed up your code:

If Instr(Phrase,“University”)>0 Then

c0=“University”

ElseIf Instr(Phrase,“Institute”)>0 Then

c0=“Institute”

End If

If c0 [#&060; #&062;] “” Then Institute=join(filter(split(phrase,“, “),c0),“”)

End Function

Hi Hans –

Thank you. I can see that it does by taking a larger view. Got to think about that last part. I’m learning something here ;-)

For the first part, no need to check if InStr() is greater than zero. InStr() returns zero if not found, giving False to the If. Life would be easier if formula FIND() or SEARCH() did that instead of returning errors.

…Michael

…Michael

Michael,

Your remarks led me to the one-liner below:

Institute=join(filter(split(phrase,“, “),IIf(InStr(phrase, “University”), “University”, IIf(InStr(phrase, “Institute”), “Institute”, “”))),“”)

End Function

Hi Hans –

Ooo…I like that. Now we’re approaching Obfuscated C ;-) Well, maybe not.

I didn’t get the need for Join() until I found that you can’t get at something like Filter(0)

Thanks

Michael (aka mrt)

Hi,

i have two sheets

sheet 1 contains

column-1-list of names

column-2-count

sheet 2 contains

column-1-date

column-2-shift 1

column-2-shift 2

column-3-shift 3

column-4-shift 4

now there might be any number of names in each column in sheet 2

(i.e there will be many people in different shifts whose names are seperated by a delimiter ” , “)

i want a user defined sode which must perform the following operation.

if the name of an associate is found in sheet 2 in any shift for that particular date,then the count corresponding to his name should increase by one.

Thanks in advance

[…] O tópico original dessa fórmula se encontra em:Â http://www.dailydoseofexcel.com/archives/2004/12/15/string-manipulation-with-worksheet-formulas-part... […]

hi,

the very first formula is workable for my file but only change i want is there is dash(-) in some strings instead of space so can u modify this formula to read it space &/or dash? & revert?

=IF(LEN(A1)=0,””,IF(ISNUMBER(FIND(” “,A1)),LEFT(A1,FIND(” “,A1)-1),A1))

adding to above i also need help regarding the following :

1.need to extract first 3 letters of 1st word & 2 letters of 2nd word of the same string if there is space betwenn 2 words

2.if there is dash (-) between two words instead of space then need to extract only 1st 5 letters of the same string

can anybody assist me on this?

This formula will extract the first word defined by a space or a dash

With your text in A1, in B1

and in C1

will extract the first five letters of the first word if the first and second words are separated by a hyphen and will extract the first three letters of the first word and first two letters of the second if they are separated by a space.

@Dick,

Your last setup will fail for something like this in A1…

One-Two Three

I believe this single formula will do work instead though…

Rick: Thanks. I assumed they would only want up to the first five letters. Possibly presumptuous, I admit.