Working with the MAXIF() and LARGEIF() functions

Oh, Wait! Excel doesn’t have those functions. We’ll have to roll our own. We’re going to use the Double Unary tip found here. In Excel, 1*TRUE = 1, 0+TRUE = 1, TRUE^1 = 1, and minus-minus TRUE (–TRUE, the double unary) = 1. It’s two sign changes, and thought to be the best way to turn an array of TRUE/FALSE into ones/zeros. Multiplying an array of parameters by an array of ones and zeros leaves only the TRUE parameters non-zero.

Back to our list of the 122 major teams in professional sports, the Boss has decided that sorting by name length, then alphabetically, isn’t going to do. He wants it by name length by stadium size. So, thanks to Wikipedia, your data looks like this, Column(F) is arena size:

  D E F
1 Anaheim Ducks 13 17,174
2 Arizona Cardinals 17 63,400
3 Arizona Diamondbacks 20 48,633
4 Atlanta Braves 14 50,097
5 Atlanta Falcons 15 71,228
6 Atlanta Hawks 13 18,729
7 Baltimore Orioles 17 45,363
8 Baltimore Ravens 16 71,008
9 Boston Bruins 13 17,565
10 Boston Celtics 14 18,624

Using fzz’s comment about the ROW() function, we’ll sort the length by G1=LARGE($E$1:$E$122,ROWS($G$1:$G1)) filled down. The table looks much like before:

  D E F G
1 Anaheim Ducks 13 17,174 29
2 Arizona Cardinals 17 63,400 22
3 Arizona Diamondbacks 20 48,633 22
4 Atlanta Braves 14 50,097 22
5 Atlanta Falcons 15 71,228 21
6 Atlanta Hawks 13 18,729 21
7 Baltimore Orioles 17 45,363 21
8 Baltimore Ravens 16 71,008 20
9 Boston Bruins 13 17,565 20
10 Boston Celtics 14 18,624 20

We know there are only three teams of name length 22. A formula – -($E$1:$E$122=22), using the double unary operation, gives us an array of 119 zeros, and just 3 ones that line up on 22. If we multiply that array by the array of arena capacities (F1:F122) we have 119 multiplications by zero, and three multiplications by 1. A MAX() function or an equivalent LARGE(,1) function on that array product returns the capacity of largest stadium hosting a team name 22 letters long. Those equations formulas, array entered, would look like this:

  • {=LARGE(- -($E$1:$E$122=22)*$F$1:$F$122,1)}
  • {=MAX(- -($E$1:$E$122=22)*$F$1:$F$122)}

We want to modify those for different teams. Changing the =22 to =$G1 is a start, and we’ve gone about as far as we can with our ‘MAXIF().’ It’s going to return the same thing three times. A scheme using COUNTIF() starting from the top and working down will improve the ‘LARGEIF()’. Using mixed references, $G$1:$G1 will grow as we fill down. COUNTIF($G$1:$G2,$G2) will only count one 22, COUNTIF($G$1:$G3,$G3) will count two 22s, and COUNTIF($G$1:$G4,$G4) will count 3, and that’s all there are. This array-entered equationformula then, filled down, is ‘LARGEIF()’:

  • {=LARGE(- -($E$1:$E$122=$G1)*$F$1:$F$122,COUNTIF($G$1:$G1,$G1))}

In I1 filldown =MATCH(H1,$F$1:$F$122,0) and in J1 filldown =INDEX(D:D,I1). Your table should look like this:

  D E F G H I J K
1 Anaheim Ducks 13 17,174 29 45,389 52 Los Angeles Angels of Anaheim  
2 Arizona Cardinals 17 63,400 22 19,980 95 Portland Trail Blazers  
3 Arizona Diamondbacks 20 48,633 22 19,356 63 Minnesota Timberwolves  
4 Atlanta Braves 14 50,097 22 18,144 30 Columbus Blue Jackets  
5 Atlanta Falcons 15 71,228 21 43,651 89 Philadelphia Phillies  
6 Atlanta Hawks 13 18,729 21 19,596 42 Golden State Warriors  
7 Baltimore Orioles 17 45,363 21 18,203 83 Oklahoma City Thunder  
8 Baltimore Ravens 16 71,008 20 68,756 69 New England Patriots  
9 Boston Bruins 13 17,565 20 67,164 49 Jacksonville Jaguars  
10 Boston Celtics 14 18,624 20 65,857 108 Tampa Bay Buccaneers  

Column(I) shows the indices reordered by size of the arena.

There are some problems with this approach. It only works with positive numbers, such as our capacities. Negative numbers will be less than a FALSE, and a zero will be returned in their place. And if ever capacities are equal, it will always return only the first (same old problem). Nicely here, hockey and basketball held in the same arena draw to different capacity. The file is available at http://wl.filegenie.com/~JMOprof/LargeIF.xls

…mrt
©¿©¬

6 thoughts on “Working with the MAXIF() and LARGEIF() functions

  1. Here is a link to a copy of the workbook posted by mrt with 2 extra tabs. My method is the same as seen in the other topic (finding 2nd, 3rd, 4th match). It doesn’t have the negative number limitation and with the 3rd tab I added another named range similar to the first (just one extra column) to solve the ‘same old problem’ mrt mentioned above. It is basically the same as mrt’s method except my array formula is ‘tucked’ away in a named range.

    https://skydrive.live.com/redir.aspx?cid=9a0721b634391421&resid=9A0721B634391421!143&parid=9A0721B634391421!136

    Thanks.

  2. Quibble: they’re not equations, they’re formulas.

    As for the last formula,

    {=LARGE(–($E$1:$E$122=$G1)*$F$1:$F$122,COUNTIF($G$1:$G1,$G1))}

    It appears to have a typo, a single minus rather than two. Point is as soon as you introduce the multiplication operation the unary minuses become pointless. That is, –(a=x)*b will ALWAYS return the same result as (a=x)*b because the multiplication operation automatically converts the boolean array (a=x) to numeric 1s and 0s. This should have been obvious to any Excel expert.

    First, MAX(IF(condition,array)) may be 1 char longer than MAX((condition)*array), but MAX(IF(..)) does less work because IF returns FALSE for entries where condition is FALSE, and MAX simply skips those entries. MAX((..)*..) includes the zero results when condition is FALSE in its calculations. Also, MAX((..)*..) is a bug-in-waiting in situations where the largest value could be a negative number, e.g., total returns by stock for stock portfolios since the formula would return the incorrect result zero. The same is true for LARGE(..,k). When you need to use array formulas anyway, IF is much more efficient than multiplying by condition arrays. Multiplying conditions only make sense in SUMPRODUCT where it allows one to avoid array formula entry.

    This is not really any different than the other posting about finding nth largest/smallest matches. You’re just replacing the second sort key original position/alpha order ascending with stadium capacity descending. If by happenstance 2 teams have stadiums with the same capacity, I’ll even throw in a third sort key of original position/alpha order ascending. Since I don’t believe any stadiums yet exceed 250,000 seats,

    E1:E122: {=262144*LEN($D$1:$D$122)+$F$1:$F$122-1/(1+ROW($D$1:$D$122))}

    G1:G122: {=LARGE($E$1:$E$122,ROW($E$1:$E$122)-ROW($E$1)+1)}

    This gives a single composite sort key with team name length first, then stadium capacity, then original position, and the resulting ordered indices. Then just use

    J1: =INDEX($D$1:$D$122,G1)

    with J1 filled down into J2:J122.

  3. fzz, your comments such as ‘This should have been obvious to any Excel expert’ do seem rather confrontational. By all means point out issues as we’re all here to learn but perhaps some diplomacy in the way you deliver the advice would sugar the pill a little.

  4. Typo of an unusual sort: The post as written had minus-minus abutted together (copied out of the worksheet). The browser rendered that as a single “minus” of ascii 208, per Cell View, and that doesn’t make sense to me. Anyway, I edited it to separate.

  5. Well, it turns out that the second setup I used in my original post didn’t work as I thought. I decided to compare it to a manual, descending sort of the data [by LEN, by Capacity, by Team] and found that it was off with quite a few teams. I then compared mrt’s method to the manual sort and noticed that New York Giants pulled in twice, since the Giants and Jets use the same stadium. So back to the drawing board I went! This one took a few days to figure out but I finally found a setup that works. It was tested for accuracy by comparing to the manual sort on the last tab that I added. The file is uploaded to my SkyDrive account; hopefully the link above in my first post still takes you to the “Excel Files” folder that I created where you will find “TeamsLargeIf v2.xlsx”. If the link doesn’t work let me know. v2 has 4 tabs: 1/ mrt’s LargeIF, 2/ my first attempt that matches mrt’s 3/the latest attempt I am about to explain and 4/a manual sort of the data for comparison. I tried to incorporate fzz’s formulas but I wasn’t able to get it to work as it was posted.

    Columns A:H are the basic data seen in previous workbooks, so I will skip over them

    Column I has the formula:
    =SIGN(G2)*(ROWS($I$1:I2)-1)/RowDivisor

    This formula, copied to the end of the data, houses the relative row to be used in the INDEX formula utilized in Column J. It could be added as part of said INDEX formula without causing it to be a CSE formula but I kept it separate for clarity.
    RowDivisor is defined as:
    =10^LEN(MATCH(1E+300,’kmc (3)’!$G:$G))
    10 is raised to the power of the LEN of the last row number used. In this workbook row 123 is the last used row in Column G so 10^3 or 1000.
    The result of the above is multiplied by the sign of the capacity, as this number will be added to the matching capacity in the INDEX formula and therefore needs to be the same sign.

    Column J (J2 to J123)
    =(LARGE(INDEX(((H2=$F$2:$F$123)=FALSE)*-9E+300+(H2=$F$2:$F$123)*$G$2:$G$123+(H2=$F$2:$F$123)*$I$2:$I$123,,),COUNTIF($H$2:H2,H2)))

    The LARGE and COUNTIF pieces are known so on to the ‘new’ stuff. I am housing the arrays in an INDEX formula so CSE doesn’t have to be used.

    The first piece of the INDEX:
    ((H2=$F$2:$F$123)=FALSE)*-9E+300 – This is multiplying all non-matches of LEN by a sufficiently large negative number (has to be bigger than the largest possible negative value in the data). These large negative numbers will effectively replace the 0s in the final array, allowing the LARGE function to pull in negatives.

    The second piece:
    +(H2=$F$2:$F$123)*$G$2:$G$123 – This is returning all of the capacities where the LEN found a match and is added to the first piece. We now have an array of capacities and very large negative numbers.

    The third and final piece:
    +(H2=$F$2:$F$123)*$I$2:$I$123 – This is adding the INDEX row numbers calculated in Column I to the capacities. Since we divided the numbers by the divisor, none of the numbers are greater than 1, so the row numbers are effectively delimited by the decimal point.

    We then pull the largest amount in with the LARGE & COUNTIF functions.

    Finally, the row numbers are extracted from the numbers in column J and multiplied by the divisor to bring them back to the actual number rather than a fraction of one and this number is used to pull the team.

    An obvious limitation of this method is if the data used already has a decimal point and we try to add the fractional row number. This will of course create a text value, such as 12345.11.003 and LARGE will pull an error. Other than that it seems pretty sound.

  6. Hi Kyle –

    I think I would have found a pregnant lady in the Giants crowd and bumped their capacity up 0.5 ;-)

    Very nice. Happy New Year.

    …mrt

Leave a Reply

Your email address will not be published. Required fields are marked *