OR: The problem with VLOOKUP. Wikipedia gives us a table of birthstones, and I think just because it’s Wikipedia, there’s a trailing space after every entry. Pasted into a spreadsheet, the table looks like this:
A | B | C | D | E | |
---|---|---|---|---|---|
1 | Month | Traditional Birthstone(s) | Modern Birthstone(s) | Mystical Birthstone | Ayurvedic Birthstone |
2 | January | Garnet | Garnet | Emerald | Garnet |
3 | February | Amethyst | Amethyst | Bloodstone | Amethyst |
4 | March | Bloodstone, Jasper | Aquamarine | Jade | Bloodstone |
5 | April | Diamond, Sapphire | Diamond | Opal | Diamond |
6 | May | Emerald, Agate | Emerald | Sapphire | Agate |
7 | June | Alexandrite, Emerald | Moonstone, Pearl | Moonstone | Pearl |
8 | July | Ruby, Onyx | Ruby | Ruby | Ruby |
9 | August | Sardonyx, Carnelian | Peridot | Diamond | Sapphire |
10 | September | Sapphire, Peridot | Sapphire | Agate | Moonstone |
11 | October | Tourmaline, Aquamarine | Opal, Tourmaline | Jasper | Opal |
12 | November | Citrine, Topaz | Topaz, Citrine | Pearl | Topaz |
13 | December | Zircon, Ruby | Turquoise, Blue Topaz | Onyx | Ruby |
(Only there for completeness, the mystical stones have Tibetan origin, and the Ayurvedic stones have Indian sub-continent origin.) If you want to know the modern birthstone for April, you might use the VLOOKUP() function, probably as follows:
- =VLOOKUP(“April”,A1:E13,3,FALSE)
“April” for the month, A1:E13 as the array to search, 3 as the column from which to return, and FALSE because Months are not in an alphabetic sort and you want an exact match. If you do, you’ll get a #N/A error, indicating no match. That’s because of Wiki’s trailing space after April in A:A. Lets define a name: Birthstones =Sheet!$A$1:$E$13. Proper syntax could be:
- =VLOOKUP(“April “,A1:E13,3,FALSE)
- =VLOOKUP(“April “,Birthstones,3,FALSE)
- =VLOOKUP(“April”&”*”,A1:E13,3,FALSE)
- =VLOOKUP(“April”&”*”,Birthstones,3,FALSE)
All Return “Diamond “. Note the added wildcard asterisk in the last examples. You can use wildcards in FALSE VLOOKUPs, and you can do it front and back: =VLOOKUP(“*”&”April”&”*”,Birthstones,3,FALSE). But what if you want to know which month has the modern birthstone of Opal? You can’t use VLOOKUP() at all, because it only looks to the right, and the first column is the one searched. When I first encountered this limitation, my fix would be to add a column F:F to the right edge equal to A:A, and look right. It works, but there is a much better way. Lets define some more names:
- Month =Sheet1!$A$1:$A$13
- Traditional =Sheet1!$B$1:$B$13
- Modern =Sheet1!$C$1:$C$13
- Headings =Sheet1!$A$1:$E$1
The MATCH() function searches a 1*n or a n*1 array and returns the 1-based number where the match is found. It takes a third argument which gives the match type. When the match type argument is 0 (zero), MATCH() looks for exact matches, and can take wildcards, functioning akin to VLOOKUP() when set FALSE. Looking for Opal in the modern birthstones could be:
- =MATCH(“*”&”Opal”&”*”,C1:C13,0)
- =MATCH(“*”&”Opal”&”*”,Modern,0)
The INDEX() function specifies an array and specifies how far down to go within which column of the array, and doesn’t care left or right. Remembering that MATCH() returns a number, the syntax could be:
- =INDEX(A1:E13,11,1) Opal being the 11th entry in C1:C13, and we thus want the 11th item from Column 1 (A1:A13)
- =INDEX(A1:E13,MATCH(“*”&”Opal”&”*”,Modern,0),1)
- =INDEX(Birthstones,MATCH(“*”&”Opal”&”*”,Modern,0),1)
All done knowing the Month is in Column 1. But you don’t really need to know even that:
- =INDEX(Birthstones,MATCH(“*”&”Opal”&”*”,Modern,0),MATCH(“*”&Month&”*”,Headings,0))
- =INDEX(Month,MATCH(“*”&”Opal”&”*”,Modern,0)) with no need to specify column.
Will return “October ” no matter where the Month column is. INDEX() and MATCH() work so powerfully together that I rarely use VLOOKUP() at all. Left or Right doesn’t matter.
…mrt
nice. also, indirect and match can be even more powerful than index and match, letting you define arrays dynamically.
You got lucky with the use of wildcards. {grin} If December were listed before November, you would not be able to match November’s Topaz.
With (an implicit) M-N relationship between months and birthstones, I’d be inclined to go with a SQL query (set up easily enough with MS Query).
First off, that trailing “space” is actually the character with ASCII code 160. As for formulas, what about doing something like this…
=INDEX(A1:A13,SUMPRODUCT(ROW(Modern)*ISNUMBER(SEARCH(“*opal*”,Modern))))
With respect to my previous posting, that formula should look more like this…
=IF(COUNTIF(Modern,”*opal*”),INDEX(A1:A13,SUMPRODUCT(ROW(Modern)*ISNUMBER(SEARCH(“*opal*”,Modern)))),””)
in order to stop a false result from being returned in case the word being looked up does not occur within the column.
Rick –
Not in this case. It’s Ascii 32. I triple checked. For reasons lost to antiquity I thought you had to concatenate the wildcard to the root. Obviously from your formula, I thought wrong. Thank you. And thanks for showing another way
Tushar –
What my dad used to say: ’tis better to be lucky than good. :roll: But you’re right. I can’t figure out how to do that without knowing/using that Topaz has a partner.
…mrt
>> Rick –
>>
>> Not in this case. It’s Ascii 32. I triple checked. For reasons lost to antiquity I thought
>> you had to concatenate the wildcard to the root. Obviously from your formula, I thought
>> wrong. Thank you. And thanks for showing another way
@Michael,
The reason I thought it was an ASCII 160 character was because I copy/pasted your chart from the webpage directly into XL2007 and, now that I think of it, that is probably where the ASCII 160 character came from… your webpage that I copied the data from, not the original data that you downloaded. And, of course, you are welcome for my “another way” method… as I know you are aware, there is almost alway two or more ways to accomplish any goal in Excel.
@Rick…when I copy and paste this content into excel, the whole data set gets stuck into one cell and I can’t find a worthy delimiter to use text-to-columns. What’s your secret?
I even tried using a web query to suck it off both this page as well as wikipedia, but no joy.
It annoys me to see lookups like this in the wild.
The root problem is a trailing space. I would insist on cleaning the data before building the lookups.
I still think your formulas are pretty cool.
Rick –
Above is Ascii 160, on Wiki is Ascii 32. I made the change and don’t seem to have a good reason any more ;-). It’s still wild, though.
Jeff –
Not to speak for Rick, but for me, it was Paste Special/Text from the Wiki table.
…mrt
Ughhh… mislead by the title. I got excited thinking that you had a solution for a vlookup that looked to the left of the matching column (imagine for a second that the month column was in Row F, and you had to find the Modern stone for April…).
vlookup matches on the left most column, and pulls the corresponding value from the columns to the right.
A “left looking” vlookup would work the other way around!
In the past, I’ve simply inserted a new column “A” and used a simple “=F2? or similar copied down the rows to “re-index the list. However, I prefer not to have to tweak the raw data every time.
Any ideas? Perhaps a good topic for the future.
– Peter
Peter –
If I understand, that’s what this is all about. I do have a solution that looks to the left of where the match is found. In fact, I mentioned that I would formerly insert an F:F equal to A:A so I could continue to use VLOOKUPs facing right.
The INDEX() and MATCH() functions don’t care which side you want to return the data from.
…mrt
@michael Thanks for that. Why the heck paste special would work when the more intuitive route of text to columns would fail is puzzling. How the average user is supposed to know something like that is completely beyond me.
1. Index/Match Can also work with Horizontal tables and can lookup or down (Hlookup can only look down)
2. Match Can find the closest match above/below(1/-1)
3. One can use a single match function in a column and a single entered index function or multiple indexes to speed up lookups
I just can’t fathom ignoring the benefits of data cleansing. Unless the benefits from leaving leading and trailing spaces (ASCII 32 or 160 decimal) in-place outweigh the benefits from removing those spaces, which seems very unlikely, a little effort spent removing leading and trailing spaces would be a much more useful topic than working around such cruft. All it takes is a simple macro to go through all cells in the selected range and trim their contents.
If there’s some benefit to leaving leading/trailing whitespace, it may be more efficient to replace the 2nd argument range reference to match with an expression like TRIM(SUBSTITUTE(range,CHAR(160),” “)) in order to avoid needing to use Excel’s overly * wildcard before and after the true search term.
On the separate topic of generalized lookups, INDEX(.,MATCH(.)) has been discussed frequently and at length in newsgroups and other forums for more than two decades. Not much new here.
I’d note that this is one area OpenOffice Calc is more capable than Excel. Calc provides an option for using regular expressions in its (V|H)LOOKUP, MATCH and SEARCH functions. Using regular expressions in the 1st argument to MATCH would allow simpler formulas for distinguishing, e.g., Topaz from Blue Topaz.
Calc: =MATCH(“(.+, *)?Topaz(,.+)?”;Modern;0)
This functionality could be included in Excel with udfs which make use of VBScript regular expression objects.
Excel may be pretty good for working with numbers, but it leaves a lot to be desired for working with text. Generalized lookups are text matching operations, and thus there are much better tools than what native Excel provides. It’s useful to know where Excel is weakest and to explore alternatives to native Excel functionality in those situations.
I encountered the VLOOKUP limitation many years ago, and as a (then) recent venturer into the world of formulas. It’s on the natural path of progression for anyone with an interest in Excel.
Tushar lightly toasted me in a forum for mentioning a freeware spreadsheet I got off a cover disk that didn’t have a problem with ‘left looking lookups’, which was why I was so surprised that Excel did when I came to produce an identical spreadsheet.
Are we saying that Excel 2010 still has this? I hesitate to call it a limitation, bug, design fault, whatever. So there must be a good reason for it being the way it is. Any suggestions?
Now that trailing spaces is a topic; they can cause ugly errors on filtering.
Apologies to Tushar, it was someone else doing the toasting that day.
I remember one head of HR coming to learn Excel. She had this data of her employees and every time we started to extract the number of managers in her organization, the countif function would show 470 and she insisted there were 512! After asking her where she got the data from I just used the ‘trim’ function and voila the result was 512! Trailing, leading or multiple spaces during data entry are a cause of great frustration. Therefore, it is fundamental that people make proper data entries in an Excel worksheet.
Of course, your solution to the encountered problem is lovely. Thank you!