I have a list of objects. Oh, I don’t know, let’s say it’s the top 5 grossing movies from 1988.

I want to rank these movies by how much I like them. To do this, I want to compare only two movies at a time until they’re in the proper order. Any comparison sort should do the trick and I like QuickSort. With five items, I really don’t need to compare two at a time as I can easily do the comparison in my head. But, as usual, there’s more to the story. I want it to work if the list is 100 items. That’s not so easy to do in your head. It’s also not so easy to do two at a time because that’s a lot of comparing.

Another element to this is that I want to crowdsource it. I don’t want to make all the comparisons myself. Rather, I want to do a few comparisons and have other people do a few comparisons. In the end, I would have enough comparisons to see the groups preference. This is where QuickSort really falls down. It’s no good for only doing partial comparisons or for having more than one judgment in the mix. But it’s a worthy exercise, so I did it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
Sub RankQuickSort() Dim aItems() As Variant Dim rList As Range Dim rCell As Range Dim i As Long Set rList = wshList.Range("B1", wshList.Range("B1").End(xlDown)) ReDim aItems(1 To rList.Cells.Count) For Each rCell In rList i = i + 1 aItems(i) = rCell.Value Next rCell QuickSort aItems, LBound(aItems), UBound(aItems) For i = LBound(aItems) To UBound(aItems) Debug.Print aItems(i) Next i End Sub Public Sub QuickSort(ByRef vArray As Variant, lLow As Long, lHigh As Long) Dim sPivot As String Dim sSwap As String Dim lTmpLow As Long Dim lTmpHigh As Long lTmpLow = lLow lTmpHigh = lHigh sPivot = vArray((lLow + lHigh) \ 2) Do While lTmpLow <= lTmpHigh Do While (FirstIsLess(vArray(lTmpLow), sPivot) And lTmpLow < lHigh) lTmpLow = lTmpLow + 1 Loop Do While (FirstIsLess(sPivot, vArray(lTmpHigh)) And lTmpHigh > lLow) lTmpHigh = lTmpHigh - 1 Loop If lTmpLow < lTmpHigh Then sSwap = vArray(lTmpLow) vArray(lTmpLow) = vArray(lTmpHigh) vArray(lTmpHigh) = sSwap End If If lTmpLow <= lTmpHigh Then lTmpLow = lTmpLow + 1 lTmpHigh = lTmpHigh - 1 End If Loop If lLow < lTmpHigh Then QuickSort vArray, lLow, lTmpHigh If lTmpLow < lHigh Then QuickSort vArray, lTmpLow, lHigh End Sub Public Function FirstIsLess(sFirst, sLast) As Boolean Dim lResp As Long If sFirst = sLast Then lResp = vbNo Else lResp = MsgBox(sFirst & " is better than " & sLast, vbYesNo) End If FirstIsLess = (lResp = vbYes) End Function |

RankQuickSort is the starting procedure where I fill the array and kick off the recursive QuickSort procedure. QuickSort is a recursive procedure that sorts the array. The difference between this procedure and a normal QuickSort procedure is that the comparison is made by a human. Instead of code like

1 |
sPivot < vArray(lTmpHigh) |

, the FirstIsLess procedure is called that presents a message box for the user to decide which is less and which is more.

That works well because I’m the only one judging and there are only a small number of judgments to make. As the list grows and I want to distribute the work to more people, I don’t think QuickSort is the answer. Even with one person, there is a possibility for inconsistency. I could say that *Coming to America* is better than *Twins*, *Twins* is better than *Big*, and *Big* is better than *Coming to America*. I need a method to avoid that.

I thought a “voting” system would work. Instead of swapping one list item for the other like in a comparison sort, I would simply award points to the better one and sort on the points. As it turned out, and as I’ll explain in the next post, it takes far too many comparisons for that to work reliably.

You can download VoteRank.zip

Hmmm… did somebody just see “The Social Network”?

What you invented, Dick, is a technique commonly known as Pairwise Comparisons.

http://en.wikipedia.org/wiki/Pairwise_comparison

Taking the example of movies, have you seen http://www.flickchart.com ? It reranks after each pairwise comparison – when you rate a film above one that’s higher in your current list then the preferred one is promoted to be one place above the other. That’s all it appears to do (or appeared to when I looked at it) and I hate it.

Actually, I hated it so much that I seriously contemplated writing an alternative that would apply something like Elo ratings. (http://en.wikipedia.org/wiki/Elo_rating_system). Then I woke up and got sensible.

No sorting is possible if Coming to America > Twins, Twins > Big, Big > Coming to America. The set wouldn’t be well-ordered.

If you have a lot of values, A..Z with A > B, B > C and C > A, then if the first pivot value were B, values A and C would be partitioned into different subsets. A and C will NEVER be compared to the other, and the values other than B in the same subset as either A or C will NEVER be compared to C or A, respectively.

Set partitioning sort algorithms like quicksort will never have a problem with lack of transitive pairwise ordering because any pair of values will be compared at most once, and many pairs of values will never be compared. The final ordering depends on the exact sequence of pairwise comparisons, which depends on the original ordering and the process used to select pivot values.

If you award points and quicksort on points, then the EXPECTED number of comparisons is N log(N), but pathologies in the data (original ordering and pivot value selection) could result in N^2 comparisons. However, no commonly used sort algorithm will ever need more than N^2 comparisons. Not all that much for a computer to handle if N = 100 or even N = 200.

What if you just had people compare between two items at a time like when you go to the eye doctor? A or B?, B or C? Ect.

This way you would only need to know which is better between two items (a much easier decision for people to make). Decision making would also be sped up and you could compile more user comparisons.

From here it would be simple enough to rank the overall scores (i.e. how many people chose A when presented verses how many people chose C wehn presented).

The only issue might be when items tie.

Whatever it’s called, you’ve clearly chosen the best movie.

What if you build a sorted list, one movie at a time, using a binary search to find the correct position of each successive movie? Similar to QuickSort’s divide-and-conquer approach, you keep paring away half of the list until you find where it goes.

As an example, let’s say you’ve got 9 movies ranked so far, and you want to find out where Movie10 goes. Take the middle of the list, Movie5, and compare it to Movie10. Let’s say Movie10 is better. Pick the middle of Movies 1-4 and compare. Let’s say Movie10 is worse than Movie3. The middle of 3 and 5 is 4. Now we just need to compare Moviie10 to Movie4. If it’s better, it gets the new 4-spot. If it’s worse, it gets the new 5-spot. A collection object might work best for this, as it allows insertion before or after items.

The nice thing about this is that to add a new item to the list, you’ll only need around Log2(N) comparisons, where N is the number of items in the current list. For example, if your list is 50 items long, it would take about 6 comparisons to find where it goes. To build a whole list of 50 items, you’d need about 237 total comparisons.

Does this post come with subtitles?