Quick post. I was writing up the Double-VLOOKUP trick I learnt from Charles Williams for the book.
Standard VLOOKUP on unsorted data:
=VLOOKUP([@ID],Table2,2,FALSE)
Double VLOOKUP trick on sorted data:
=IF(VLOOKUP([@ID],Table2,1,TRUE)=[@ID],VLOOKUP([@ID],Table2,2,TRUE),NA())
The point of the double VLOOKUP trick is this:
- Standard VLOOKUPS on unsorted data are slow, because your VLOOKUP has to look at each item in turn until it finds a match. So on average, it looks at – and discounts – half the things in your lookup list before it finds that match.
- Binary searches are lightning fast. Because your data is sorted, they can start half way through the lookup list, and check if the item at that point is bigger or smaller than what they’re looking for. Meaning they can ditch half the list immediately, then look halfway through the remainder. And over and over, ditching half the list each time until they either find the item they are looking for, or rule out all items.
- VLOOKUP and MATCH will quite happily do a binary search for you. But for reasons known only to Microsoft, they offer an exciting plot-twist: if they don’t find what you’re looking for in the list, they return the closest match they can find to it, which happens to be the only thing left in the list when they’d divided it in half enough times. Only they don’t tell you its not an exact match. Phooey.
- Charles’ brilliant trick is to do two lightning-fast Binary searches – the first one simply looks for the closest match to your input term among the Lookup terms. If it’s an exact match with what you fed it, you know your lookup term is in the list. So now that you KNOW it’s there for sure, you kick off a second approximate match VLOOKUP, which will grab the corresponding value you want from the lookup table. Go read his blog for the specifics.
I’ve known about this for a while, but it’s only as I’m writing this up for the book that I’ve gained an appeciation of just how much faster Binary Searches – and the Double VLOOKUP trick that gets around Microsoft’s crap implementation of them – are, compared to unsorted/linear VLOOKUPs.
See for yourself:
That says that:
- At one extreme, if your lookup table has 10,000 things in it, the double VLOOKUP trick on sorted data is 28 times faster than the standard VLOOKUP on unsorted data
- At the other, if your lookup table has 1,000,000 things in it, the double VLOOKUP trick on sorted data is 3,600 times faster than the standard VLOOKUP on unsorted data
Kinda puts the ‘hassle’ of sorting your lookup table ascending into perspective, don’t it!