Finding the 2nd, 3rd, 4th … MATCH()

Excel’s MATCH() function will return the relative location of the first occurrence that satisfies the match criterion (the lookup_value) within the specified array (the lookup_array).  Most often, this is what you want.  For instance, if you have duplicate entries, returning the first match is no different than returning the last match.  However, if you have unique entries with the duplicated criterion, how do you return all the unique values?

For example, you have a list of the 122 major professional sports teams (32 NFL, and 30 MLB, NBA, and NHL) sorted alphabetically in Column D.  In Column E you have code that returns the length of the team name, i.e. =LEN(D1).  Your data might look like this:

D E
1 Anaheim Ducks 13
2 Arizona Cardinals 17
3 Arizona Diamondbacks 20
4 Atlanta Braves 14
5 Atlanta Falcons 15
6 Atlanta Hawks 13
7 Baltimore Orioles 17
8 Baltimore Ravens 16
9 Boston Bruins 13
10 Boston Celtics 14

For no good reason we want to create a descending sort on name length. You can do it from the Sort selection, but we also want to keep the alpha-sort. In F1 you put =LARGE($E$1:$E$122,ROW()) and fill down. The ROW() function will cause the return of the first largest, then the second largest etc. Then you want to know how far down Column E those descending numbers fall, so that you can Index the name list that amount. In G1 put the code =MATCH(F1,$E$1:$E$122,0) and fill down. You’ll see from the data that the longest name is 29 letters, and it’s in ROW(52).

D E F G
1 Anaheim Ducks 13 29 52
2 Arizona Cardinals 17 22 30
3 Arizona Diamondbacks 20 22 30
4 Atlanta Braves 14 22 30
5 Atlanta Falcons 15 21 42
6 Atlanta Hawks 13 21 42
7 Baltimore Orioles 17 21 42
8 Baltimore Ravens 16 20 3
9 Boston Bruins 13 20 3
10 Boston Celtics 14 20 3

You can begin to see the problem: There are three teams with a 22-letter name, three with a 21-letter name, and at least 3 (in fact 13) with a 20-letter name. If in H1 we put =INDEX(D:D,G1) and filldown, we get:

D E F G H
1 Anaheim Ducks 13 29 52 Los Angeles Angels of Anaheim
2 Arizona Cardinals 17 22 30 Columbus Blue Jackets
3 Arizona Diamondbacks 20 22 30 Columbus Blue Jackets
4 Atlanta Braves 14 22 30 Columbus Blue Jackets
5 Atlanta Falcons 15 21 42 Golden State Warriors
6 Atlanta Hawks 13 21 42 Golden State Warriors
7 Baltimore Orioles 17 21 42 Golden State Warriors
8 Baltimore Ravens 16 20 3 Arizona Diamondbacks
9 Boston Bruins 13 20 3 Arizona Diamondbacks
10 Boston Celtics 14 20 3 Arizona Diamondbacks

This is exactly what we wanted to avoid. So, using Col(H) now as a helper column, in H1 put =G1. In H2 put

  • =IF(G2<>G1,G2,H1+MATCH(F2,INDIRECT(“E”&(H1+1)&”:E122″),0))

and fill down. If the value did not repeat, just copy it over; and if it didn’t, we’re making a “sliding range” that starts one row after the start of a repeated pair, ends at the end of the data, and “slides” down Column E. The sliding range is made by the INDIRECT() function. It builds the range one row down (H1+1) from the first of a repeating match. We add H1 to that result, and we have our index down Column D. This may be better seen in H3, adjacent to G3, where the first repeat starts.

  • =IF(G3<>G2,G3,H2+MATCH(F3,INDIRECT(“E”&(H2+1)&”:E122″),0))

The IF() is false. We are looking for the relative location of the lookup_value (22) in a new lookup_range (E31:E122). It’s 33 rows in. We add the original 30 to that, and the index is now 63. A similar thing happens in H4. H5 is G5 copied over. In Column I, I1 =INDEX(D:D,H1), filled down. The resulting table:

D E F G H I
1 Anaheim Ducks 13 29 52 52 Los Angeles Angels of Anaheim
2 Arizona Cardinals 17 22 30 30 Columbus Blue Jackets
3 Arizona Diamondbacks 20 22 30 63 Minnesota Timberwolves
4 Atlanta Braves 14 22 30 95 Portland Trail Blazers
5 Atlanta Falcons 15 21 42 42 Golden State Warriors
6 Atlanta Hawks 13 21 42 83 Oklahoma City Thunder
7 Baltimore Orioles 17 21 42 89 Philadelphia Phillies
8 Baltimore Ravens 16 20 3 3 Arizona Diamondbacks
9 Boston Bruins 13 20 3 49 Jacksonville Jaguars
10 Boston Celtics 14 20 3 53 Los Angeles Clippers

The names are now sorted by length, by alphabet. INDIRECT() is volatile, so there is a penalty for this. For extra credit, and no peeking, what’s the shortest name?

…mrt
©¿©¬

34 thoughts on “Finding the 2nd, 3rd, 4th … MATCH()

  1. Hello and thanks for the article.

    For what it’s worth I think it could be simpler to :
    – create a first helper column, for instance column F to make each length unique with this formula : F1 =E1-ROW()/1000
    (I’m subtracting and not adding to keep alphabetical order)
    – create a second helper column to rank unique length : G1 =RANK(F1,F:F)
    – then retrieve largest length in column H : H1 =INDEX(E:E,MATCH(ROW(),G:G,0))
    – and retrieve team name in column I : I1 =INDEX(D:D,MATCH(ROW(),G:G,0))

  2. One more option,

    E1=LEN(D1)

    F1=RANK(E1,$E$1:$E$122)+COUNTIF($E$1:E1,E1)-1

    G1=INDEX($D$1:$D$122,MATCH(ROWS($G$1:G1),$F$1:$F$122,0))

    Regards

  3. Very nice solution. I don’t like having to refer to previous rows that could have different formulas than the other cells in the column. I like the use of the ROWS function as it works this way even if the data is resorted. I had tried something similar but just used the value of F1 as the row value for the index in G1 but it didn’t work when the data was resorted.

  4. Without using helper columns, you could use the following array formula to return the team names in alphabetical order, starting with the longest:
    =INDEX(D$1:D$122,MATCH(LARGE(LEN(D$1:D$122)+1/ROW(D$1:D$122),ROWS(F$1:F1)),LEN(D$1:D$122)+1/ROW(D$1:D$122),0))

    By adding 1/ROW(D$1:D$122) to the length, you get a tie-breaker. MATCH then returns the one and only alphabetically ordered team name with the nth longest length.

    I put my formula in column F, then used the same trick with ROWS as did Sailepaty.

  5. Brad –

    A philosophical difference here. Excel now gives us more columns than I know how to use. Helper columns allow me to track each stage of calculation, and next week I’ll still be able to understand what I was trying to do.

    I used to go for the megaformula too, and then I read this advice:
    ———-
    Keep in mind that a complex formula such as this is virtually impossible to understand. Therefore, use this type of formula only when you are absolutely certain that it works correctly and you are sure that you will never need to modify it in the future. Better yet, keep a copy of those intermediate formulas — just in case.

    J-Walk on http://spreadsheetpage.com/index.php/tip/creating_a_megaformula/
    ———-
    Changed my life. ;-)

    …mrt

  6. First there’s the classic bug-in-waiting of using ROW() when ROWS(…) is much safer, e.g.,

    =LARGE($E$1:$E$122,ROW())

    would return undesired results if any rows were inserted above row 1, but

    F1: =LARGE($E$1:$E$122,ROWS(F$1:F1))

    with F1 filled down into F2:F122 would return the correct results no matter how many rows were inserted above. Simple rule of thumb: ROW(), COLUMN() and CELL(“attribute”) without range reference arguments are almost always errors. Bad policy to get lazy and use them even in simple examples just to avoid a little typing.

    How many times was this asked and ANSWERED BETTER in the newsgroups?!

    If you don’t need to show the lengths in column E, change the column E formulas to

    E1: =LEN(D1)-(ROWS(E$1:E1)-1)/ROWS(D$1:D$122)

    fill E1 down. Then use the following

    F1: =LARGE($E$1:$E$122,ROWS(F$1:F1))
    G1: =INDEX(D$1:D$122,MATCH(F1,E$1:E$122,0))

    fill F1:G1 down into F2:G122. The column G results produce the desired sorted list. With NO volatile function calls.

    Why go out of your way to provide an approach inferior to the average newsgroup responses of old?

  7. Complaining about megaformulas on grounds of incomprehensibility while advocating the use of INDIRECT is just a wee bit hypocritical, no?

  8. fzz-

    How do you access the newsgroups of old? My ISP no longer supports USENET, and Microsoft shifted to forums in 2010. There appears to have been little if any migration. A search today on MATCH() returned all of one page of hits.

    I don’t go out of my way to provide inferior work. I go out of my way to provide well-explained work at an intermediate level, with the expectation that the advanced approaches from you and others will materialize. I’ve not been wrong.

    …mrt

  9. Google Groups. Their archive is still there. And they support new posts. Maybe it’s time to revive comp.apps.spreadsheets, which is pure/true USENET rather than vendor-sponsored nntp sites which may or may not be carried by ISPs.

    Use Google Groups advanced search, search words MATCH LARGE ROWS tie, and author Tom Ogilvy or Frank Kabel. You’ll get at least several dozen hits.

    I should also note that Brad Yundt used a similar approach in his formula. Tie breaking on original position is an old subject.

  10. Final critique. The final col H formula

    =IF(G2G1,G2,H1+MATCH(F2,INDIRECT(“E”&(H1+1)&”:E122″),0))

    could be rewritten without INDIRECT as

    =IF(G2G1,G2,H1+MATCH(F2,INDEX(INDEX(E$1:E$122,H1+1):E$122),0))

    Open question whether INDEX(INDEX(..)..) is clearer than INDIRECT(..), but it’s not volatile, so there’s much less recalc lag penalty. Possibly OFFSET would be clearest of all, but still volatile.

    =IF(G2G1,G2,H1+MATCH(F2,OFFSET(E$1,H1,0,122-H1,1),0))

  11. A joy of Excel is the many ways to fill a need. The moon would have turned blue many times before I thought of indexing an index.

    With 2.5 GHz processors and up so common, volatility in the small spreadsheets is not what it once was.

    …mrt

  12. “Complaining about megaformulas on grounds of incomprehensibility while advocating the use of INDIRECT is just a wee bit hypocritical, no?”

    Not in my opinion, no. Stating a personal preference isn’t even a little bit hypocritical if you don’t suggest that it’s the only way to do it.

  13. I’ve learned something useful from Michael and also from fzz. That’s a great score. It would suck for me if Michael had not posted his “inferior” approach. I’m trying to build a linear regression model that is dependent on a “sliding” range. fzz’s approach would not work and I already figured it out, but validation certainly makes me feel better that my model is the best one for the situation.

    That said, megaformulas are a waste of time. You will always have to revisit them and you will never remember the mechanics of them. That time/filesize savings you achieved with the megaformula will be negated when your client wants a simple update that you cannot reasonably bill for more than two hours though it takes you all of those two hours to understand what the heck you were doing in the first place.

  14. Point is clarity.

    INDIRECT isn’t clear. Most forms of multireferent indexing are unclear. Complex formulas aren’t clear. So if it were only a question of clarity, different people would make different choices.

    However, using several columns of intermediate calculations introduces greater chance for error just by spanning more cells. And it doesn’t avoid complexity entirely: it’s still necessary to understand the flow of calculations through multiple cells.

    If one of the basic goals is NOT reinventing the wheel, then udfs would arguably be best. In this case, using Laurent Longre’s MOREFUNC.XLL add-in. With the names in D1:D122, select E1:E122 and enter the array formula

    =INDEX(D1:D122,VSORT.IDX(LEN(D1:D122),0,D1:D122,1))

    All you need to know is VSORT.IDX’s syntax [odd arguments are sort keys, even arguments specify descending(0) or ascending(1) on the preceding key]. You could even get tricksy and use SQL.REQUEST with the sort expressed as a SQL query which would arguably be the clearest way to do this.

  15. Let’s consider the ‘megaformula’ in this problem (modified):

    =INDEX(names,MATCH(LARGE(LEN(names)-1/(1+ROWS(E1:E$122)),ROWS(E$1:E1)),LEN(names)-1/(1+ROWS(E1:E$122)),0))

    The INDEX(names,…) piece should be obvious, and is unavoidable anyway.

    The MATCH(LARGE(array,j),array,0) piece may take a little thought, but should become clear that it returns the position of the j_th largest item in array. It’s also unavoidable. Putting the array piece into a separate range does mean it needn’t be calculated twice, which it needs to be in the formula above. Therefore, selecting E1:E122 and entering the array formula =LEN(D1:D122)-1/(124-ROW(D1:D122)) would arguably be optimal.

    The key bit, however, is the definition of one array for the composite sort key of length (descending) then original position/alpha order (ascending). It’s simpler to combine original position with length than to calculate dynamic ranges referring to the rest of the lookup column below the row of the current match.

    Complaints about megaformulas just due to length have some merit, but it’s akin to advice to write all prose at the level 10-year-old children could understand. No multiple independent clauses in the same sentence. No passive voice. Very sparing use of dependent clauses. Few sentences per paragraph. Maybe everyone can read ‘See Spot run! Run, Spot, run!’, but most adults tire of it and welcome more challenging prose. No more than 4 function calls per formula analagous to children’s picture books, more than 12 per formula analagous to German philosophy texts, but up to 12 a magical middle area allowing some moderate complexity in order to provide expressiveness.

  16. Interesting discussion. Aside: Funny how everything is now appearing in italics.

    Looking at the page source, it seems this might be caused by an accidental italics tag Michael has used when he signed off his comment on December 6, 2011 at 10:37 am

    It shows like this (assuming the PRE tags I’m placing here do their thing): If so, who’d have thought that xhmtl tags in the comments can screw with the rest of your site.

    Maybe I can get it to reset by throwing in a closing Italic. Here goes nuttin…

  17. Hi Jeff –

    Guilty as charged, it would seem. I fixed my sig. The slash was on the wrong side of closing out the i. Can’t repeat it without perpetuating it.

    Now to see if you have done so ;-)

    …mrt

  18. First time posting on this site, so hopefully this will be readable!

    How about this method?

    With a new workbook:

    Sheet name = “N-Match”

    Copy this data into E1:F7
    john 69
    joe 33
    3sara 6
    joe 84
    suzy 55
    judy 27
    john 29

    Create Name “MatchRange”;
    Refers to: =(‘N-Match’!$E$1:$E$7=’N-Match’!$B$1)*ROW(‘N-Match’!$E$1:$E$7)

    Create Name “Convert0toZ” (this is to convert non-matches to “z” from 0, so they are ignored in the SMALL formula I will use shortly)
    Refers to: =IF(MatchRange=0,”z”,MatchRange)

    Create Name CountMatches (this is to tell you how many matches there were [not necessary but useful])
    Refers to: =SUMPRODUCT(–(Convert0toZ”z”))

    Copy these labels into A1:A4
    Item to Match
    Total Matches
    Match to Return
    Result

    Cells B1:B4
    B1.value – “joe” (this will pull into the name “MatchRange”)
    B2.formula – =CountMatches
    B3.value – 2 (this will pull the second match of joe, if it exists)
    B4.formula – =IF(B3>B2,”There are only “&B2&” matches”,INDEX(E1:F7,SMALL(Convert0toZ,B3),2)) (pulls 84, the number related to the second occurrence of joe; enter 1 in B3 and B4 will pull 33)

    With this set-up it will only find exact matches but that could be adjusted by including SEARCH/FIND and IFERROR/IF(ISERROR) in “MatchRange”

  19. It would probably help if I re-read the topic before posting so my answer is on point. What I posted before can be used to find the nth Match, but is not really what Michael posted about. So, to modify my previous post:

    Copy this to Column E: (already alpha-sorted)
    joe1
    joe2
    john12
    john2
    judy1
    sara1
    suzie1

    Copy this into F1 and the fill down to F7
    =LEN(E1)

    Copy this into G1 and fill down to G7
    =SMALL(($F$1:$F$7),ROW())

    define name: “MatchRange2”
    =(‘N-Match’!$F$1:$F$7=’N-Match’!$G1)*ROW(‘N-Match’!$F$1:$F$7)

    define name: “Convert0toZtwo”
    =IF(MatchRange2=0,”z”,MatchRange2)

    Copy this into H1 and fill down to G7
    =INDEX($E$1:$E$7,SMALL(Convert0toZtwo,COUNTIF($G$1:G1,G1)),1)

    Two small modifications here:
    1/ MatchRange2 has a relative reference now
    2/ Counter for SMALL is now a COUNTIF based on the sorted LEN column

  20. Kyle,
    I ran into a couple of issues when testing your second suggestion. The first was when defining MatchRange2–because there is a relative reference to row 1, the active cell must be in row 1 when you create the named range. The second is that you need to type the double quotes surrounding “z” when defining named range Convert0toZtwo. If you just copy and paste the formula from the web post, you get a #NAME? error value as a result.

    Having made those corrections, I then noticed that you had solved a problem similar to the one posed by Michael–but not exactly the same. Your formula returns the names of the same length in alphabetical order with the shortest lengths coming first. Michael’s original problem, however, was to start with the longest lengths first. You can rectify this issue by changing the formula in column F to:
    =1/LEN(E1)

    Brad

  21. Brad,
    Thanks for the reply. I noticed too, after I posted my second message, that I did it backwards. I didn’t post again but it can also be changed by using LARGE instead of SMALL in column G and it will go from largest to smallest. You can then easily swap between asc & desc sorts with a CHOOSE function with the SMALL & LARGE functions, if desired.
    Quotes copied from the web always need to be replaced, I find.
    Finally, I should mention my test data was already alpha-sorted but you can of course use the COUNTIF(xx,”<"range) method that mrt and fzz used.

    Thanks,
    Kyle

  22. Just going to post a link to a copy of the workbook so people don’t have to recreate my method and apply it to the Teams file linked to by mrt. I made a couple changes: I combined the two named ranges I posted about, changed Row() to Rows() and added an offset to the named range so rows can be inserted at will above the data without messing it up. It is the Teams.xlsx at the other end of this link
    https://skydrive.live.com/redir.aspx?cid=9a0721b634391421&resid=9A0721B634391421!143&parid=9A0721B634391421!136
    The other file TeamsLargeIf.xlsx relates to mrt’s other post on “LARGEIF” and “MAXIF”. Same method can be used there, will post about that on the other topic though.

    Let me know if this Skydrive thing doesn’t work, first time using it.
    Thanks.

  23. I want to find the second, third values using this index formula, please help

    =IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$Z$3:$Z$6000&’2013 MBR’!$AA$3:$AA$6000,0)),IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$AC$3:$AC$6000&’2013 MBR’!$AD$3:$AD$6000,0)),IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$AF$3:$AF$6000&’2013 MBR’!$AG$3:$AG$6000,0)),IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$AI$3:$AI$6000&’2013 MBR’!$AJ$3:$AJ$6000,0)),IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$AL$3:$AL$6000&’2013 MBR’!$AM$3:$AM$6000,0)),IFERROR(INDEX(‘2013 MBR’!M3:M6000,MATCH(“1″&$F$2&”Day”,’2013 MBR’!$S$3:$S$6000&’2013 MBR’!$AO$3:$AO$6000&’2013 MBR’!$AP$3:$AP$6000,0)),”NA”))))))

  24. karthik – a formula this long and with that many layers of IFERROR is pretty much well impenetrable to an outsider. If you need help, I suggest you post on a suitable forum such as http://www.MrExcel.com/forum and make sure you clearly set out what your dataset looks like, and what you are trying to achieve.

    One thing I know for certain is that there will be a much simpler formula than the above that does what you want.

  25. yes you can get second third and any value for match. please let me have a time for two days and will upload a detailed example.

  26. The method of finding the second or next values involves the use of COUNTIF:

    =A3&COUNTIF(A3:A13,A3)
    code item cost code&count
    100 Russian Vodca 5 1002
    101 Indian spices 12 1013
    105 Italian dress 25 1052
    108 Burger 2 1081
    101 Indian spices 16 1012
    103 Chinese medicine 25 1031
    110 Japanese vehicle 5000 1101
    125 Indian dress 45 1251
    105 Italian dress 30 1051
    101 Indian spices 15 1011
    100 Russian Vodca 4 1001

    in the above method you need not have any data in ascending or descending order. it is independent of order.

    the first column is a code letter given to an item and is unique as in most ERP’s and stores or manufacturing units. the item was purchased a number of times at varying rates and the data is a continuous record.

    in the second table we have a middle colomn that will give the required value of match. This gives in reverse order. that is the largest number of purchase is actually the first purchase value. the smallest #, which is always 1, gives the latest value. Item 101 was purchased first time at 12 and this appears by entering # 3. # 4 will give #N/A as this is appearing only three times.
    Note that we need a helper column in the main data table above, which is a combination of item code and count of the number of times its appearance.

    =INDEX($C$3:$C$13,MATCH(F3&G3,$D$3:$D$13,0),0)
    code # value cost
    101 3 12
    105 1 30
    100 2 5
    100 3 #N/A
    101 1 15

    the data are dynamic and more entries of the same item shall change the value in the above column.

    Any comments?

  27. I have another method that I think is quite useful.

    What you do is you get a pen, and you point it at the screen on top of the first letter. After this initial step you proceed to count the rest of the letters, including the spaces, across until you are done with that name. Then you can use notepad (under programs) to record this, or you can write it on an irl-notepad.

    Do this for all the names, after you are done you will have the names with all their lengths. Then you can insert the names into excel in their order. Make sure to check for mistakes :)

    Happy Halloween!

    Henry

  28. Don’t know if anyone is willing to tell me how to do this, but i’ll give it a try.

    I have a truly massive spreadsheet (45000+ lines) of part specification data, with a separate page set up as an interface for looking up matching values.

    Sometimes, a value has only one match in the table, but sometimes there are multiple values.

    I’ve learned how to lookup and list multiple matches in a separate table where the user enters the value to be matched in one cell and this separate table looks up all the matches in multiple rows beneath.

    The problem is that the sheet is looking up as many as 500 values at a time, each on a separate row. If there is a single match, then a certain cell needs to show the matching value. If there are multiple matches, I need to create a userform or similar such that the user can click on the cell where the formula returned multiple matches and have a form pop up listing all the multiple matches…the user would then click on the match they want and that selected value would go into the same cell. Alternatively, the user could manually type in the value they want into the same cell.

    When the multiple matches come up in the userform or whatever popup, the list needs to list multiple column to enable the user to choose the right one.

    Can anyone point me to an example where someone has solved something similar to this?

    I’m a real newbie when it comes to learning VBA programming (I just today managed to successfully create my first combo box! Yay!) but I’ve gotten good at finding something similar to what I need online and adapting it.

    Any help would be sincerely appreciated.

  29. It’s a bit hard to conceptualize this, Bryan. Suggest you whip up a sample file showing what you want, post it in one online help forum like chandoo.org/forum or the MrExcel or Excel Guru forums, and post a link here to your question.

Leave a Reply

Your email address will not be published. Required fields are marked *