VLOOKUP: V is for Volatile.

Well THAT sure got your attention.

I was doing a bit of research for the book (the only bit, mind), and I came across this great old thread on VLOOKUP vs INDEX/MATCH.

There’s a few false starts in terms of test methodology until Jon von der Hayden puts things back on the right track about halfway through the thread.

Conclusion is that INDEX/MATCH is probably way to go, but there’s not a great deal in it.

But I think there’s another important point in the INDEX/MATCH combo’s favor that has been overlooked in this thread: and that’s this bit in bold:

=VLOOKUP(lookup_value,table_array,col_index_num,range_lookup)

The fact that a VLOOKUP functions points at an entire table of data – even though it only ever actually makes use of two columns of it (the key column on the far left, and the nth column given by the col_index_num argument) – means that any time you change any cell in that table, your VLOOKUP is going to recalculate. And then so are ALL formulas downstream of those VLOOKUPS. It’s exactly as if VLOOKUP is a volatile function, where changes to that that Table is concerned.

Whereas with INDEX/MATCH, your functions will ONLY recalculate if you change something in the specific columns that the INDEX and MATCH combo references.

Given that, I’ll take INDEX/MATCH any time.

One thing still puzzles me hugely about VLOOKUP though: the recalculation time incurred from updating the lookup table seems much much longer than the recalculation time incurred from changing an input parameter for the VLOOKUP itself. And the same goes with INDEX/MATCH.

Take the situation where you’ve got a VLOOKUP and Table that looks like this:
 
vlookup issue
 
 

The KEY column is just the numbers 2 through 1,048,576 sorted in random order. And the Value column is just the row number. I won’t post a sample file because it runs to about 20MB, but it’s easy to set up yourself in about 20 seconds.

Changing the orange input cell triggers the VLOOKUP, which gets the right answer pretty quick. I haven’t timed it, but we’re talking ‘blink of an eye’… even when I tell it to look for a number that I happen to know is right at the bottom. Really fast.

But if I type anything in that middle column in the Table – which also causes our VLOOKUP to recalc – Excel acts really really sluggish. Maybe a second of delay in some cases, plus it makes my screen flash.

Which is weird, because it’s the same one VLOOKUP that gets triggered in both cases.

Why would that be?

—UPDATE—
If you haven’t already, go and read Charles Williams’ awesome article Developing Faster Lookups – Part 1 – Using Excel’s functions efficiently – updated

Look for the bit “Large dependency ranges trigger unnecessary Lookups”.

Read it, and don’t weep.

23 thoughts on “VLOOKUP: V is for Volatile.

  1. Does not behave like your weird behavior for me: takes 23 milliseconds for the linear search in manual mode and in automatic mode its correspondingly fast.

  2. Huh? I thought, VLOOKUP actually *was* volatile?

    Anyway, the reason of the calculation time difference seems straightforward to me:
    Case 1: Look for the new value in the index already built of key values.
    Case 2: Rebuild the entire index.

    I’ve always been told, building indices costs a lot of time, but is worth it, because they radically speed up things later.
    Could it be that obvious?

  3. The other thing about INDEX/MATCH is you can MATCH once, INDEX many times. A colleague had a slow Excel table because he was bringing back multiple columns of the top 30 hits from the same 750,000 row table with the same criteria, setting VLOOKUP to find the top 30 over and over every time.

    I created a MATCH column to search once, then set the other columns to do INDEX based on the results of that match, for a tenfold reduction of refresh time.

  4. Wow, Jeff, I didn’t think there was anything left to learn on this topic, but that range-recalc issue is a new one to me. Between that and your Ctrl-Shift-A trick you’re really earning your keep! Great stuff.

    From an un-expert point of view it makes sense that the calculation engine would be smart enough to recognize that an input parameter change would require much less work than a reconfiguration of the lookup table.

  5. Charles: I’m not sure from your comment whether you catch my drift correctly.

    The issue was recalc time through changing the lookup_value argument of the vlookup vs recalc time incurred via changing the table_array argument. For me, the times are wildly asymmetric.

  6. So I just tried this out on my wife’s Tablet. The effect is the same: Excel takes much much longer to recalc when I change a value in the table_array argument than when I change a value in the lookup_value argument range.

    Even on a dead simple setup.

    Can some other folk try this out and post back?

    Don’t even bother with the random numbers/row numbers setup above. Just do this:

    1. Select the entire column C, and enter the number 1 into the whole column with ctrl + enter
    2. Select C2, and put the number 2 in it. Then Select C1:C2 and fill down the series, so you get 1,2,3,…1048576
    3. Put the number 1 in A1
    4. Put the formula =VLOOKUP(A1,C:D,1,0) into B1

    Now, change the value in A1 a few times, and see if you notice any delay. Then select D1 and enter some numbers a few times in that column. Any slower?

  7. I don’t get any significant difference in timing either in Manual or automatic mode: I upped the scale to 1000 Vlookups but still can’t see any difference. You must be doing something different to me, but I followed your instructions exactly.

    The semi-volatile behavior (I sometimes call it fan-out) occurs with anything that can reference a large range, such as SUMPRODUCT and array formulas. Its one of the reasons why the FastExcel LOOKUP functions (MEMLOOKUP, AVLOOKUP2 etc) are faster than the standard Excel lookup functions.

  8. That’s weird, Charles. I wonder what the difference between my PCs and yours are, other than the fact that mine are upside down given I’m down under.

    Hopefully some other readers will post back their findings.

    Fan-out: that’s a great term.

  9. Charles: I tried it on an older laptop running 2010 and I don’t get any issues. I only get the issues on the two devices I have that run 2013.

    What flavor of Excel did you try it on?

  10. Excel 2013 user here, and I don’t get any difference in timing either, sorry!
    I’ve tried both of the lookup examples.

  11. Excel 2013 and Windows 8.1.

    I get no noticeable delay when changing the lookup value or changing anything in the table. I tried all 3 columns, also with and without making the lookup range into a table.

    Also tried various options under Options-advanced, but nothing made any noticeable difference.

    Version is 15.0.4701.1002

  12. Jeff, good point about Index/Match reducing dependencies. I’m sure many others know this already, but one other benefit of Index/Match vs Vlookup i didn’t see mentioned before is you can easily check which cell is being referenced. To do this just highlight the entire formula in the formula bar:

    =INDEX(F:F,MATCH(A2,D:D,0))

    and press F5 followed by enter to take you straight to the cell, then esc to cancel.

    This is because Index/Match returns a cell reference whereas Vlookup returns the value itself. (While I’d known of F9 to return the values of a highlighted formula, i only just noticed you can press F5 to return a cell reference from the formula bar as well.)

  13. That workarond’s from another Jeff. We’re everywhere. You mean the INDIRECT one? Definitely not me. I’m a DIRECT guy, not and INDIRECT lover.

  14. It was this comment from another Jeff who bears an uncanny resemblance: http://dailydoseofexcel.com/archives/2015/03/08/welcome-to-excel-not-quite-finished-week/#comment-743304.

    My understanding is that table references need to include a table name so the Goto command can figure out the context. If a reference is local (i.e. in the same table) you just get something like @[Column1] within a formula which confuses things. Instead you can try Table1[@[Column1]:[Column1]] which doesn’t lose the table name. (Or else use INDIRECT() but I’m not a lover of that either.)

    The issue looks to stem from the underlying evaluate method – i can see a related problem was raised here: http://dailydoseofexcel.com/archives/2014/11/24/evaluateevaluate/ – from one of those many other Jeffs out there. I’m now imagining the movie poster “Being Jeff Weir” :)

  15. Ah. That Jeff. I am Jeffinately related to him.

    The problem with the full qualification is that it doesn’t stick around on Table References that reference part of the current Table i.e. you can’t turn this:
    =INDEX(Data[Surname], [@Row])
    …into a fully qualified reference:
    =INDEX(Data[Surname], Table1[[@Row]])
    …because Excel just takes it out again:
    =INDEX(Data[Surname], [@Row])

  16. indeed, which is why I ws suggesting to use something like:

    =INDEX(Data[Surname],Table1[@[row]:[row]])

    Anyway this is getting OT and a little tedious – let’s move on…

  17. @ Charles:

    You must be doing something different to me, but I followed your instructions exactly.

    It looks like an issue with Tables/ListObjects to me. In those instructions above, I forgot to mention that you need to convert C:C into a Table/Listobject before the problem rears its ugly head. And I only have this issue on versions later than 2010.

    If I try with with C:C as a normal range, I get minimal delays when rapidly entering data into C:C by typing some gibberish and pushing enter as quick as I can. If I then convert C:C to a Table/Listobject, I get really weird delays. When I change it back to a range again, I’m back to minimal delays for updating data.

    On an another sample file with a few more formulas in it, I converted the lookup array into a Table/ListObject and manually entered new data into it at the rate of one change per second. While the first change gets executed straight away by Excel, there’s then a 6 second delay between me entering subsequent data and Excel showing that new data in the grid. And although I pushed Enter 20 times at a rate of once per second, only 15 numbers got updated. Excel simply missed 5 of those attempted changes, because it was too busy doing something. God knows what.

    If I throw a few more VLOOKUPS into the sheet, then Excel won’t even update half the time…it just goes into ‘Not Responding’.

    I’d be interested to see if anyone else can replicate this, and on what versions it is a problem. Happy to send a sample file out to any volunteers.

  18. And it gets even weirder: Even if I then delete all the lookup formulas that point at that Lookup array, Excel still goes incredibly slowly when I rapidly change the data in the old Lookup array. But when I change that array into a range again, Excel updates pretty much as fast as I can throw data at it.

    So here’s my hypothesis: Something in the calculation engine gets screwed up when formulas are pointing at an old-school range, and you subsequently change that range to a ListObject/Table. And so the recalc delay I’ve been witnessing isn’t caused so much by the one VLOOKUP formula recalculating every time I make a change to the table, but rather by the 4 million cells in the table being scanned by Excel whenever one of them changes, to check if anything has changed.

    This is also quite possibly related to the issue Charles blogged about at https://fastexcel.wordpress.com/2012/01/30/excel-2010-tableslistobject-slow-update-and-how-to-bypass/

Leave a Reply

Your email address will not be published. Required fields are marked *