It’s fun to argue about whether VLOOKUP or INDEX/MATCH is better, but to me that’s missing the point: they are both bad.
So I decided to design and build some better ones.
VLOOKUP INDEX/MATCH problems
Here are some of the more-frequently mentioned VLOOKUP INDEX/MATCH problems
- Slow exact match (linear search)
- Approximate sorted match is the wrong default 99.9% of the time and gives the wrong answer without warning
- Cannot do exact match on sorted data (well they can but only if they ignore sorted!)
- Numeric VLOOKUP answer column easy to break
- No built-in error handling for exact match
- VLOOKUP very inflexible
- INDEX/MATCH more flexible but still limited
- …
MEMLOOKUP/MEMMATCH – easier and faster alternatives to VLOOKUP/MATCH
MEMLOOKUP ( Lookup_Value, Lookup_Array, Result_Col, Sort_Type, MemType_Name, Vertical_Horizontal )
The syntax is designed to make it easy to convert a VLOOKUP to MEMLOOKUP, but there are differences!
- Defaults to Exact Match on both unsorted and unsorted data
- Use either column labels or numbers
- Fast exact match on both unsorted and sorted data
- Automatic optimisation of multiple lookups within the same row
So you want more flexibility? Try the AVLOOKUP/AMATCH family of functions
It’s always tempting to cram in more function (scope creep is universal), but if the result is too many parameters then it’s a mistake. So instead there is a whole family of these lookup functions that build on the MEMLOOKUP/MEMMATCH technology to provide the ultimate in flexibility and power whilst remaining efficient.
- Lookup using any column
- Lookup using more than one column without slow concatenation
- Lookup the first, last, Nth or all results on both sorted and unsorted data
- Lookup both rows and columns (2-dimensional lookup is built-in)
- Built-in error handling for exact match
- Return multiple answer columns
- Case-sensitive lookup option
- Regex match option
Try them out for yourself
These functions are included in the 90 or so additional Excel functions built into FastExcel V3.
You can download the trial version from here.
If you like them then ask Microsoft to add them to the next version of Excel!
I would be delighted to tell the Excel team how I built these functions and the algorithms they use.
By the way they are written as C++ multi-threaded functions in an XLL addin for maximum performance.
Ha! I’m still asking Microsoft to add a dynamic CONCATENATE to the last version of Excel.
No doubt these are wonderful functions, Charles, and I congratulate you on them.
For the more common and simpler lookup type tasks, what are your thoughts with respect to somehow having SQL type functionality readily available in Excel?
Pete: SQL is slow compared to array manipulation, as evidenced by an earlier post of mine at http://dailydoseofexcel.com/archives/2013/11/21/unpivot-shootout/
OK, and I suspect Charles’ functions will be faster too. Maybe the speed consideration alone would kill an SQL type approach. I’m asking from a different perspective, though. What about the idea?
Have simple (?) database query-like alternatives to having more specialised functions & loading up worksheets with, for example, sometimes tens of thousands of formulas for lookup type results across many fields.
A lot is asked of Excel. Tasks that might be best in a database are done in Excel: I’m thinking to encourage a more database like approach. SQL is handy for databases and would be a good skill for Excel users who handle increasingly large datasets. Rather then creating more specialised/arcane formulas.
I have a plan at some stage to put together an app that accepts SQL commands but actually translates them into array manipulations, to address this.
@Pete,
The D functions borrowed some of Zloof’s QBE (Query By Example) grid, but don’t really fit well with the rest of Excel. I am not sure how well SQL fits with Excel formulas either.
The closest I have got is with the FILTER.IFS function, but it extends SUMIFS type of syntax in a more general way rather than using SQL syntax.
At the high-end (BI – large data) we seem to be going towards the PowerPivot (DAX) and PowerQuery (M) approach of having additional languages rather than extending Excel formula language.
Simon Peyton-Jones’s approach is also interesting
http://research.microsoft.com/en-us/um/people/simonpj/Papers/excel/index.htm
Agreed re CONCATENATE. It’s one of the few areas of formula-work which is really restrictive: the possibilities which having an array-processing, concatenating worksheet function would bring are many and varied.
I would extend this wish to include a general re-working of all string functions to be able to operate over arrays of values.
So that means, for example, an array-processing SUBSTITUTE (replacing the current, absurdly-lengthy “nested” set-up). Ditto for REPLACE.
And (assuming they haven’t done so in 2013, which I haven’t yet seen), Microsoft, please, please, please allow us to increase the size of that paltry Evaluate Formula box! Some of us rather depend on it, y’ know!
Cheers
@ Charles, Thank you very much for your insightful comments and view of the direction for large data. The (2003!) paper from Simon Peyton-Jones was interesting: there is obviously a whole field of intelligent research into the subject. A snippet in there (page 3) was no real surprise, “Our interview with an auditor at a major accounting firm revealed that he considers the problems of error proneness to be quite severe: “I will remind you that in 6 years work, checking literally hundreds of business-critical models, … my team have never failed to find errors.”
@ XOR LX, improved worksheet function string concatenation functionality would be welcome. Though for me, extending array processing is going the wrong way. I see inefficient, complex, non-intuitive, non-scalable and plain messy array formula constructs offered in online Excel forums to do what can be simple by other means. People learn these array formula methods but fail to learn alternative good practices for setting up data, and fail to learn other solutions. Where array formulas can solve the requirement they seem to be popular but then I see them extended and overloaded. I guess either they should be welcomed & developed, or just excised (from Excel) totally. As users handle larger datasets the issue exacerbates.
Pete: Amen, re I see inefficient, complex, non-intuitive, non-scalable and plain messy array formula constructs offered in online Excel forums to do what can be simple by other means. People learn these array formula methods but fail to learn alternative good practices for setting up data, and fail to learn other solutions.
I see people on forums asking for/advocating SUMPRODUCTS over thousands of rows and multiple conditions, when a mere PivotTable would not only do it with ease but offer instant flexibility, long term understandibity, and no overhead between PivotCache refreshes.