VLOOKUP & INDEX/MATCH Are Both Badly Designed Functions: Here Are Some Better Ones

It’s fun to argue about whether VLOOKUP or INDEX/MATCH is better, but to me that’s missing the point: they are both bad.

So I decided to design and build some better ones.

VLOOKUP INDEX/MATCH problems

Here are some of the more-frequently mentioned VLOOKUP INDEX/MATCH problems

  • Slow exact match (linear search)
  • Approximate sorted match is the wrong default 99.9% of the time and gives the wrong answer without warning
  • Cannot do exact match on sorted data (well they can but only if they ignore sorted!)
  • Numeric VLOOKUP answer column easy to break
  • No built-in error handling for exact match
  • VLOOKUP very inflexible
  • INDEX/MATCH more flexible but still limited
  •  …

MEMLOOKUP/MEMMATCH – easier and faster alternatives to VLOOKUP/MATCH

MEMLOOKUP ( Lookup_Value, Lookup_Array, Result_Col, Sort_Type, MemType_Name, Vertical_Horizontal )

The syntax is designed to make it easy to convert a VLOOKUP to MEMLOOKUP, but there are differences!

  • Defaults to Exact Match on both unsorted and unsorted data
  • Use either column labels or numbers
  • Fast exact match on both unsorted and sorted data
  • Automatic optimisation of multiple lookups within the same row

So you want more flexibility? Try the AVLOOKUP/AMATCH family of functions

It’s always tempting to cram in more function (scope creep is universal), but if the result is too many parameters then it’s a mistake. So instead there is a whole family of these lookup functions that build on the MEMLOOKUP/MEMMATCH technology to provide the ultimate in flexibility and power whilst remaining efficient.

  • Lookup using any column
  • Lookup using more than one column without slow concatenation
  • Lookup the first, last, Nth or all results on both sorted and unsorted data
  • Lookup both rows and columns (2-dimensional lookup is built-in)
  • Built-in error handling for exact match
  • Return multiple answer columns
  • Case-sensitive lookup option
  • Regex match option

Lookups1

Try them out for yourself

These functions are included in the 90 or so additional Excel functions built into FastExcel V3.
You can download the trial version from here.

Download FastExcel V3

If you like them then ask Microsoft to add them to the next version of Excel!

I would be delighted to tell the Excel team how I built these functions and the algorithms they use.

By the way they are written as C++ multi-threaded functions in an XLL addin for maximum performance.

 

 

Lunch Pairing Performance

A little history:
Lunch Pairing Objects
Populating Lunch Pairings
Lunch Pairing Constraints

To round out the Lunch Pairings series, I want to write about some of the design decisions and performance measurements.

Design

I’m not totally happy with the end result. This always happens when I have combinations and constraints. Combinations are fine. Constraints are fine. But combinations that must be constrained always make me loop more than I like. It just seems like there should be a better way. Maybe someday I’ll be smart enough to figure it out.

I started with a collection of contacts and I removed a contact from the collection when I added it to a lunch. I like this method of avoiding duplicates – make a list and remove the selected item from the list. The problem with this method is that it only serves one of my constraints – no lunch per contact per month. It doesn’t consider duplicates over two or ten months.

Another method I like to use when working with combinations is to make a list of every possible combination and choose randomly from the list. This works great with non-combination lists. There’s a lot of overhead with creating every possible combination and, when I had finished coding it, I realized that it’s a stupid idea. Rather than create every combination then test it, I could test it as I create it, which is what I ended up doing.

I’m still not supremely confident that the current method will procedure good results, particularly as the history of lunches grows. Time will tell, I guess.

Performance

My first iteration with the final design took 282 seconds to run. I never optimize code for speed until the end. Make it work, make it pretty, make it fast. In that order. Well, it was time to optimize and 282 seconds was not an acceptable run time.

Well it’s pretty obvious where to start. My AttendeeList property returned a delimited string of sorted contact names. Calling that 3.5 million times was costly, to say the least. Once a lunch was create, its contacts did not change. There was no need for me to keep calculating that property every time. I made the property a write-once, read-many property.

Public Property Get AttendeeList() As String

Dim aList() As String
Dim i As Long, j As Long
Dim sTemp As String
Dim sReturn As String
Dim lCnt As Long

If Len(msAttendeeList) = 0 Then
ReDim aList(1 To Me.Attendees.Count)
For i = 1 To Me.Attendees.Count
lCnt = lCnt + 1
aList(lCnt) = Me.Attendees.Contact(i).FullName
Next i

For i = LBound(aList) To UBound(aList) - 1
For j = i To UBound(aList)
If aList(i) > aList(j) Then
sTemp = aList(i)
aList(i) = aList(j)
aList(j) = sTemp
End If
Next j
Next i

For i = LBound(aList) To UBound(aList)
sReturn = sReturn & aList(i) & "|"
Next i

sReturn = Left$(sReturn, Len(sReturn) - 1)
msAttendeeList = sReturn
End If

AttendeeList = msAttendeeList

End Property

The first thing the property does is check for the existence of msAttendeeList, the module level variable. If it’s there, it simply returns it. If it’s not there, then it hasn’t been calculated yet. The string is created and stored so future calls don’t have to calculate. This got my run time down to 60 seconds.

The next lowest hanging fruit was checking to see if there are any matches. Here’s what that property looks like now – I cleaned it up for the previous posts.

Public Property Get AttendeeMatch(clsLunch As CLunch, lMatchMax As Long) As Boolean

Dim vaMe As Variant
Dim vaLunch As Variant
Dim i As Long, j As Long
Dim lCnt As Long
Dim clsMeAtt As CContact
Dim clsLunchAtt As CContact

' For Each clsMeAtt In Me.Attendees
' For Each clsLunchAtt In clsLunch.Attendees
' If clsMeAtt.ContactID = clsLunchAtt.ContactID Then
' lCnt = lCnt + 1
' If lCnt >= lMatchMax Then Exit For
' End If
' Next clsLunchAtt
' If lCnt >= lMatchMax Then Exit For
' Next clsMeAtt

vaMe = Split(Me.AttendeeList, "|")
vaLunch = Split(clsLunch.AttendeeList, "|")

For i = LBound(vaMe) To UBound(vaMe)
For j = LBound(vaLunch) To UBound(vaLunch)
If vaMe(i) = vaLunch(j) Then
lCnt = lCnt + 1
End If
Next j
Next i

' For i = LBound(vaMe) To UBound(vaMe)
' If InStr(1, clsLunch.AttendeeList, vaMe(i)) > 0 Then
' lCnt = lCnt + 1
' End If
' Next i

AttendeeMatch = lCnt >= lMatchMax

End Property

I used the Split function on both lunch’s attendee lists and roll through the array looking for matches. First, I tried to use Instr instead of split (that’s the lower commented out area).

Nope, that’s worse. Next I try going to a For Each loop (the upper commented area).

Oh, that’s way worse. Back to my original Split way and my 60 second run time. Then I figured it out. Sometimes, often times, I get so hung up on the Total column that ignore the Count column. Once I worked on lowering the count, everything got a lot faster. How to lower the count of the match calls? Find matches sooner rather than later. Instead of For Each looping through the lunches, I looped through in reverse. I was far more likely to have a match on a lunch from the current month because the constraints are tighter. So by checking the most current lunches first, I reduce the overall number of calls. Here’s the IsRepeat property that loops through the lunches backward.

Public Property Get IsRepeat() As Boolean

Dim clsLunch As CLunch
Dim bReturn As Boolean
Dim i As Long

For i = gclsLunches.Count To 1 Step -1
Set clsLunch = gclsLunches.Lunch(i)

If clsLunch.LunchDate = Me.LunchDate Then
If clsLunch.AttendeeMatch(Me, 1) Then
bReturn = True
Exit For
End If
End If

If clsLunch.IsWithin(Me.LunchDate, 2) Then
If clsLunch.AttendeeMatch(Me, 2) Then
bReturn = True
Exit For
End If
End If

If clsLunch.IsWithin(Me.LunchDate, 10) Then
If clsLunch.AttendeeMatch(Me, 3) Then
bReturn = True
Exit For
End If
End If
Next i

IsRepeat = bReturn

End Property

That got me down to a manageable 13 seconds. That I can live with.

I need to get the code to PerfMon because there are two things I don’t like about it. First, it doesn’t respect my indented Exit Sub lines. Second, I would rather if it kept a history of output (append rather than output to a text file) and allowed me to comment on them. I’ve done that here by copying the results to the right and writing a comment before re-importing the sheet.

You can download LunchPairings.zip, which contains the workbook with the code and the workbook with the PerfMon results.