Working with the MAXIF() and LARGEIF() functions

Posted on December 15, 2011December 19, 2011 by Michael

Oh, Wait! Excel doesn’t have those functions. We’ll have to roll our own. We’re going to use the Double Unary tip found here. In Excel, 1*TRUE = 1, 0+TRUE = 1, TRUE^1 = 1, and minus-minus TRUE (–TRUE, the double unary) = 1. It’s two sign changes, and thought to be the best way to turn an array of TRUE/FALSE into ones/zeros. Multiplying an array of parameters by an array of ones and zeros leaves only the TRUE parameters non-zero.

Back to our list of the 122 major teams in professional sports, the Boss has decided that sorting by name length, then alphabetically, isn’t going to do. He wants it by name length by stadium size. So, thanks to Wikipedia, your data looks like this, Column(F) is arena size:

	D	E	F
1	Anaheim Ducks	13	17,174
2	Arizona Cardinals	17	63,400
3	Arizona Diamondbacks	20	48,633
4	Atlanta Braves	14	50,097
5	Atlanta Falcons	15	71,228
6	Atlanta Hawks	13	18,729
7	Baltimore Orioles	17	45,363
8	Baltimore Ravens	16	71,008
9	Boston Bruins	13	17,565
10	Boston Celtics	14	18,624

Using fzz’s comment about the ROW() function, we’ll sort the length by G1=LARGE($E$1:$E$122,ROWS($G$1:$G1)) filled down. The table looks much like before:

	D	E	F	G
1	Anaheim Ducks	13	17,174	29
2	Arizona Cardinals	17	63,400	22
3	Arizona Diamondbacks	20	48,633	22
4	Atlanta Braves	14	50,097	22
5	Atlanta Falcons	15	71,228	21
6	Atlanta Hawks	13	18,729	21
7	Baltimore Orioles	17	45,363	21
8	Baltimore Ravens	16	71,008	20
9	Boston Bruins	13	17,565	20
10	Boston Celtics	14	18,624	20

We know there are only three teams of name length 22. A formula – -($E$1:$E$122=22), using the double unary operation, gives us an array of 119 zeros, and just 3 ones that line up on 22. If we multiply that array by the array of arena capacities (F1:F122) we have 119 multiplications by zero, and three multiplications by 1. A MAX() function or an equivalent LARGE(,1) function on that array product returns the capacity of largest stadium hosting a team name 22 letters long. Those ~~equations~~ formulas, array entered, would look like this:

{=LARGE(- -($E$1:$E$122=22)*$F$1:$F$122,1)}
{=MAX(- -($E$1:$E$122=22)*$F$1:$F$122)}

We want to modify those for different teams. Changing the =22 to =$G1 is a start, and we’ve gone about as far as we can with our ‘MAXIF().’ It’s going to return the same thing three times. A scheme using COUNTIF() starting from the top and working down will improve the ‘LARGEIF()’. Using mixed references, $G$1:$G1 will grow as we fill down. COUNTIF($G$1:$G2,$G2) will only count one 22, COUNTIF($G$1:$G3,$G3) will count two 22s, and COUNTIF($G$1:$G4,$G4) will count 3, and that’s all there are. This array-entered ~~equation~~formula then, filled down, is ‘LARGEIF()’:

{=LARGE(- -($E$1:$E$122=$G1)*$F$1:$F$122,COUNTIF($G$1:$G1,$G1))}

In I1 filldown =MATCH(H1,$F$1:$F$122,0) and in J1 filldown =INDEX(D:D,I1). Your table should look like this:

	D	E	F	G	H	I	J
1	Anaheim Ducks	13	17,174	29	45,389	52	Los Angeles Angels of Anaheim
2	Arizona Cardinals	17	63,400	22	19,980	95	Portland Trail Blazers
3	Arizona Diamondbacks	20	48,633	22	19,356	63	Minnesota Timberwolves
4	Atlanta Braves	14	50,097	22	18,144	30	Columbus Blue Jackets
5	Atlanta Falcons	15	71,228	21	43,651	89	Philadelphia Phillies
6	Atlanta Hawks	13	18,729	21	19,596	42	Golden State Warriors
7	Baltimore Orioles	17	45,363	21	18,203	83	Oklahoma City Thunder
8	Baltimore Ravens	16	71,008	20	68,756	69	New England Patriots
9	Boston Bruins	13	17,565	20	67,164	49	Jacksonville Jaguars
10	Boston Celtics	14	18,624	20	65,857	108	Tampa Bay Buccaneers

Column(I) shows the indices reordered by size of the arena.

There are some problems with this approach. It only works with positive numbers, such as our capacities. Negative numbers will be less than a FALSE, and a zero will be returned in their place. And if ever capacities are equal, it will always return only the first (same old problem). Nicely here, hockey and basketball held in the same arena draw to different capacity. The file is available at http://wl.filegenie.com/~JMOprof/LargeIF.xls

…mrt
©¿©¬

6 thoughts on “Working with the MAXIF() and LARGEIF() functions”

Kyle McGhee says:

December 17, 2011 at 7:25 pm

Here is a link to a copy of the workbook posted by mrt with 2 extra tabs. My method is the same as seen in the other topic (finding 2nd, 3rd, 4th match). It doesn’t have the negative number limitation and with the 3rd tab I added another named range similar to the first (just one extra column) to solve the ‘same old problem’ mrt mentioned above. It is basically the same as mrt’s method except my array formula is ‘tucked’ away in a named range.

https://skydrive.live.com/redir.aspx?cid=9a0721b634391421&resid=9A0721B634391421!143&parid=9A0721B634391421!136

Thanks.
fzz says:

December 18, 2011 at 11:57 am

Quibble: they’re not equations, they’re formulas.

As for the last formula,

{=LARGE(–($E$1:$E$122=$G1)*$F$1:$F$122,COUNTIF($G$1:$G1,$G1))}

It appears to have a typo, a single minus rather than two. Point is as soon as you introduce the multiplication operation the unary minuses become pointless. That is, –(a=x)*b will ALWAYS return the same result as (a=x)*b because the multiplication operation automatically converts the boolean array (a=x) to numeric 1s and 0s. This should have been obvious to any Excel expert.

First, MAX(IF(condition,array)) may be 1 char longer than MAX((condition)*array), but MAX(IF(..)) does less work because IF returns FALSE for entries where condition is FALSE, and MAX simply skips those entries. MAX((..)*..) includes the zero results when condition is FALSE in its calculations. Also, MAX((..)*..) is a bug-in-waiting in situations where the largest value could be a negative number, e.g., total returns by stock for stock portfolios since the formula would return the incorrect result zero. The same is true for LARGE(..,k). When you need to use array formulas anyway, IF is much more efficient than multiplying by condition arrays. Multiplying conditions only make sense in SUMPRODUCT where it allows one to avoid array formula entry.

This is not really any different than the other posting about finding nth largest/smallest matches. You’re just replacing the second sort key original position/alpha order ascending with stadium capacity descending. If by happenstance 2 teams have stadiums with the same capacity, I’ll even throw in a third sort key of original position/alpha order ascending. Since I don’t believe any stadiums yet exceed 250,000 seats,

E1:E122: {=262144*LEN($D$1:$D$122)+$F$1:$F$122-1/(1+ROW($D$1:$D$122))}

G1:G122: {=LARGE($E$1:$E$122,ROW($E$1:$E$122)-ROW($E$1)+1)}

This gives a single composite sort key with team name length first, then stadium capacity, then original position, and the resulting ordered indices. Then just use

J1: =INDEX($D$1:$D$122,G1)

with J1 filled down into J2:J122.
Justin says:

December 19, 2011 at 2:18 am

fzz, your comments such as ‘This should have been obvious to any Excel expert’ do seem rather confrontational. By all means point out issues as we’re all here to learn but perhaps some diplomacy in the way you deliver the advice would sugar the pill a little.
Michael says:

December 19, 2011 at 7:16 am

Typo of an unusual sort: The post as written had minus-minus abutted together (copied out of the worksheet). The browser rendered that as a single “minus” of ascii 208, per Cell View, and that doesn’t make sense to me. Anyway, I edited it to separate.
Kyle McGhee says:

December 23, 2011 at 1:32 pm

Well, it turns out that the second setup I used in my original post didn’t work as I thought. I decided to compare it to a manual, descending sort of the data [by LEN, by Capacity, by Team] and found that it was off with quite a few teams. I then compared mrt’s method to the manual sort and noticed that New York Giants pulled in twice, since the Giants and Jets use the same stadium. So back to the drawing board I went! This one took a few days to figure out but I finally found a setup that works. It was tested for accuracy by comparing to the manual sort on the last tab that I added. The file is uploaded to my SkyDrive account; hopefully the link above in my first post still takes you to the “Excel Files” folder that I created where you will find “TeamsLargeIf v2.xlsx”. If the link doesn’t work let me know. v2 has 4 tabs: 1/ mrt’s LargeIF, 2/ my first attempt that matches mrt’s 3/the latest attempt I am about to explain and 4/a manual sort of the data for comparison. I tried to incorporate fzz’s formulas but I wasn’t able to get it to work as it was posted.

Columns A:H are the basic data seen in previous workbooks, so I will skip over them

Column I has the formula:
=SIGN(G2)*(ROWS($I$1:I2)-1)/RowDivisor

This formula, copied to the end of the data, houses the relative row to be used in the INDEX formula utilized in Column J. It could be added as part of said INDEX formula without causing it to be a CSE formula but I kept it separate for clarity.
RowDivisor is defined as:
=10^LEN(MATCH(1E+300,’kmc (3)’!$G:$G))
10 is raised to the power of the LEN of the last row number used. In this workbook row 123 is the last used row in Column G so 10^3 or 1000.
The result of the above is multiplied by the sign of the capacity, as this number will be added to the matching capacity in the INDEX formula and therefore needs to be the same sign.

Column J (J2 to J123)
=(LARGE(INDEX(((H2=$F$2:$F$123)=FALSE)*-9E+300+(H2=$F$2:$F$123)*$G$2:$G$123+(H2=$F$2:$F$123)*$I$2:$I$123,,),COUNTIF($H$2:H2,H2)))

The LARGE and COUNTIF pieces are known so on to the ‘new’ stuff. I am housing the arrays in an INDEX formula so CSE doesn’t have to be used.

The first piece of the INDEX:
((H2=$F$2:$F$123)=FALSE)*-9E+300 – This is multiplying all non-matches of LEN by a sufficiently large negative number (has to be bigger than the largest possible negative value in the data). These large negative numbers will effectively replace the 0s in the final array, allowing the LARGE function to pull in negatives.

The second piece:
+(H2=$F$2:$F$123)*$G$2:$G$123 – This is returning all of the capacities where the LEN found a match and is added to the first piece. We now have an array of capacities and very large negative numbers.

The third and final piece:
+(H2=$F$2:$F$123)*$I$2:$I$123 – This is adding the INDEX row numbers calculated in Column I to the capacities. Since we divided the numbers by the divisor, none of the numbers are greater than 1, so the row numbers are effectively delimited by the decimal point.

We then pull the largest amount in with the LARGE & COUNTIF functions.

Finally, the row numbers are extracted from the numbers in column J and multiplied by the divisor to bring them back to the actual number rather than a fraction of one and this number is used to pull the team.

An obvious limitation of this method is if the data used already has a decimal point and we try to add the fractional row number. This will of course create a text value, such as 12345.11.003 and LARGE will pull an error. Other than that it seems pretty sound.
Michael says:

December 30, 2011 at 8:59 am

Hi Kyle –

I think I would have found a pregnant lady in the Giants crowd and bumped their capacity up 0.5 ;-)

Very nice. Happy New Year.

…mrt