Pairwise Comparison

Thanks to J-Walk for the terminology “Pairwise Comparison”. Had I known it was called that I could have saved a lot of wasted Googles.

In Subjective Sorting, I used a QuickSort algorithm and human input to order five movies from 1988. It worked because 1) I was the only one providing input, 2) my input was consistent, and 3) the list was reasonably short. If I want more people to provide the input and do it asynchronously, it doesn’t work. If my list is 100 entries long, it becomes impractical and prone to inconsistency.

What I want to do is present two movies and have the user pick one.

The winner gets 1 point and the loser gets zero. With enough voting, I should be able to sort on the points and get a properly sorted list. I could send this spreadsheet to someone else, have them vote a number of times, then forward it on to someone else. By the time it gets back to me, the list should be a close approximation of the group’s preferences. But how many votes does it take?

I guessed that it would take 5x the number of items in the list. For my five item list, 25 votes would be necessary to have enough of a sample. That was a pure guess and it turned out to be wrong. The second time I ran it, two movies received zero votes. One is the worst movie on the list and should have received zero votes. The other simply went against better movies because the movies are shown randomly.

Another problem with this method is that the same combination of movies is sometimes shown in close proximity. If I just saw Coming to America on the left and Big on the right, I don’t need to see that same thing reversed. These problems stem from the fact that I’m randomly showing movies.

I start with a random number next to each movie

Then use this code to show two random movies

It’s pretty simple. Recalculate to change the random numbers, then pick the 1st and 2nd largest numbers.

What really needs to happen is that I need to create an array of all the possible combinations. For selecting two of five movies,

returns 10 matchups. I could just vote ten times and have my list. That doesn’t scale, though, as picking two of 120 items results in 7,140 combinations. Nevertheless, it seems clear that starting with all possible combinations is the best way. Then I could sort them randomly and present them in whatever order they’re in. That would prevent duplicates and would be more efficient. It’s not stateless, though, which is a downside. I have to keep my list in memory as opposed to just plucking a couple of random items from the list.

Let’s see how that works. First, I fill a module level array with all the combinations plus a random number for sorting.

Now my ShowMatchup picks the next combination in the list and shows it

The ranking is the same except that instead of stopping at 25 (5*5) votes, I stop at the end of the list.

What’s the point of all this? It all started when I was listening to a radio show talking about college football polls. In NCAA D1A college football, there are 120 teams and various polls that rank the top 25. One of the polls uses sports writers to vote. Another poll uses college coaches, or more likely an administrative assistant in the coaches office. Both polls have their problems, but the single biggest problem in my opinion is that humans have a desire to be consistent. If I rank Texas #1 in a week and they lose, they won’t fall as far as if I had ranked them #15. I thought they were the best and now I don’t want to admit that I was that wrong. Even if I could avoid that pitfall, there’s simply too many combinations for one person to consider.

I thought that if I could present two teams to enough people, a decent poll would emerge. Then I started ranking movies from my high school days and, well, you all know about falling down the Excel rabbit hole.

You can download which contains the old random code or

You can download which contains the combinations code.

Leave a Reply

Your email address will not be published. Required fields are marked *