Formula Challenge

I don’t know how to comment on Chandoo’s site, so I’ll post it here.

I have data in three columns: A,B,C and I want to get the average of the closest two values out of three in each row. Could you help me with a formula for this?

Here’s my contribution. A bit long.

=IF(ABS(LARGE(A1:C1,2)-MAX(A1:C1))<abs (LARGE(A1:C1,2)-MIN(A1:C1)),AVERAGE(LARGE(A1:C1,2),MAX(A1:C1)),AVERAGE(LARGE(A1:C1,2),MIN(A1:C1)))

I figured that the middle number LARGE(A1:C1,2) will always be in the equation, so I check it against MAX and MIN to see which is closer.

30 thoughts on “Formula Challenge

  1. This works for me.

    =(IF(IF(MIN(ABS(LARGE(A2:C2,2) – LARGE(A2:C2,1)), ABS(LARGE(A2:C2,2) – LARGE(A2:C2,3))) = ABS(LARGE(A2:C2,2) – LARGE(A2:C2,1)),1,2) = 1, LARGE(A2:C2,1), LARGE(A2:C2,3)) + LARGE(A2:C2,2)) / 2

  2. Since you are worried about the length of the formula, you can remove two of the LARGE calls, 2 of the MIN/MAX calls, and the ABS calls…

    =(LARGE(A1:C1,2)+IF(SUM(A1:C1)<3*LARGE(A1:C1,2),MAX(A1:C1),MIN(A1:C1)))/2
  3. why not use MEDIAN?

    =AVERAGE(MEDIAN(A1:C1),IF(ABS(MAX(A1:C1)-MEDIAN(A1:C1))<ABS(MEDIAN(A1:C1)-MIN(A1:C1)),MAX(A1:C1),MIN(A1:C1)))

    I’m guessing this is one of those that has a beautiful mathematical solution…

  4. Ooh, I feel like one could write a very short recursive function to do this for n numbers. The problem is, to understand recursive algorithms, first you have to understand recursive algorithms.

  5. It is shorter to compare the MEDIAN to the AVERAGE. This is equivalent to comparing the sum to 3 times the median:

    =(MEDIAN(A1:C1)+IF(SUM(A1:C1)>=3*MEDIAN(A1:C1),MIN(A1:C1),MAX(A1:C1)))/2
  6. A single if statment…

    =IF(ABS(MAX(A2:C2)-AVERAGE(A2:C2))<=ABS(MIN(A2:C2)-AVERAGE(A2:C2)),(SUM(A2:C2)-MIN(A2:C2))/2,(SUM(A2:C2)-MAX(A2:C2))/2)

    You have to escape your less-than and greater-than symbols on Chandoo’s site.

  7. =IF(MAX(A1:C1)-MEDIAN(A1:C1)&gt;MEDIAN(A1:C1)-MIN(A1:C1),(MEDIAN(A1:C1)+MIN(A1:C1)/2),(MAX(A1:C1)+MEDIAN(A1:C1))/2)

    After reading the challenge, I went directly to Excel and came up with this. Very similar to Leonel Q’s whose formula divides the final answer by 2 which makes his formula shorter. Looks like you have many options here.

  8. This is a fun competition, but the problem is that there are already too many clever but incomprehensible formulae like this in business spreadsheets. It makes much more sense to do it step by step in a few cells, making it clear what is going on.

    This is because
    1. avoiding errors is crucial
    2. the best way to pick up errors is by checking everything
    3. most checkers will have difficulty checking formulae like this

  9. =QUARTILE(A1:C1,1+(3*MEDIAN(A1:C1)>SUM(A1:C1))*2)

    Dermot’s right but in terms of creative thinking these sorts of exercises may provide insights that are useful elsewhere.

  10. {=SUM((A1:C1<=(AVERAGE(A1:C1)+STDEV(A1:C1)))*(A1:C1=>(AVERAGE(A1:C1)-STDEV(A1:C1)))*A1:C1)/2}

    One of the three numbers has to be more than one standard devation away from the mean, so eliminate it and average the other two.

  11. The devil, as always, is in the details. What should the result be when the median is the exact midpoint between the min and max? Jeremy’s and Ihm’s formulas return the average of min and median. I’d think in that situation you’d want the median = mean (average). To produce that and to toss out the ‘outlier’ when median and mean differ, here’s another formula dermotb can hate (piggybacking on Ihm’s).

    =QUARTILE(A1:C1,2-SIGN(SKEW(A1:C1)))

    dermotb’s concern for clarity is derivative. Accuracy is the underlying end for which clarity is just the means. The potential for errors increases as the number of cells used increases, AND using multiple cells rather than just one cell ADDS the potential for reference errors between intermediate calculations to the potential for design/algebra errors.

    If the intended result is the average of the closest 2 values when mean and median differ but median = mean otherwise, then the question becomes whether a single lookup would be clearer than 2 IFs. To me one lookup is clearer AND better than 2 IFs. Thus, the alternative to taking advantages of the quirks of QUARTILE would be

    =(MEDIAN(A1:C1)+LARGE(A1:C1,MATCH(SIGN(SKEW(A1:C1)),{-1,0,1},0)))/2

    5 function calls.

    If this particular problem were spread into multiple cells with no more than a single function call per cell,

    D1:  =MEDIAN(A1:C1)
    E1:  =ABS(A1-$D1)
    F1:  =ABS(B1-$D1)
    G1:  =ABS(C1-$D1)
    H1:  =MAX(E1:G1)
    I1:  =N(E1<>$H1)
    J1:  =N(F1<>$H1)
    K1:  =N(G1<>$H1)
    L1:  =SUM(I1:K1)
    M1:  =SUMPRODUCT(A1:C1,I1:K1)/L1

    In total, 10 function calls. This could be reduced to 5 function calls by replacing E1:G1 with the array formula

    =ABS(A1:C1-D1)

    and replacing I1:K1 with the array formula

    =--(E1:G1<>H1)

    But is that really clearer?

  12. fzz: nice tweak, that wins on brevity. For a general formula that extends to any number of values, maybe try:

    =AVERAGE(IF(IF(COLUMN(A1:C1)<TRANSPOSE(COLUMN(A1:C1)),(A1:C1-TRANSPOSE(A1:C1))^2)=MIN(IF(COLUMN(A1:C1)<TRANSPOSE(COLUMN(A1:C1)),(A1:C1-TRANSPOSE(A1:C1))^2)),(A1:C1+TRANSPOSE(A1:C1))/2))

    entered with {ctrl+shift+enter}. Just replace C1 with D1 to find the average of the closest two of four values. Helper cells won’t really help here the only other option is a udf.

  13. If you want the average of the closest k of N values, the problem screams out for a udf because the number of subsets needing to be checked grows exponentially, and instead of just checking whether median = mean, you could have situations like this for 4 values: {1,3,97,99} — should the average of the closest pair be 2 or 98, or should the result be 50?

  14. The formula returns the average of 50 for {1,3,97,99} but you could just change the first function to min, max or median to get other results.

  15. This array formula works for me:

    {=SUM(((ABS($A1:$C1-MEDIAN(A1:C1))<MAX(ABS($A1:$C1-MEDIAN(A1:C1))))*A1:C1))/2}
  16. First – I read this as being 3 and only 3 values which may or may not be equal.
    Second – special cases occur when
    a) intervals are equal
    b) two or more values are equal.

    My test data was thusly:

    ABCAnswer
    1251.5
    1454.5
    1353
    1151
    1555
    3333

    =+SUM(Min*((Max-Med)>(Med-Min)),Med,Max*((Max-Med)<(Med-Min))) / SUM((Max-Med)>(Med-Min),(Max-Med)<(Med-Min),1)

    Where
    Med = MEDIAN(A1:C1)
    Min = MAX(A1:C1)
    Max = MIN(A1:C1)

    Or in full:

    =+SUM(MIN(A16:C16)*((MAX(A16:C16)-MEDIAN(A16:C16))>(MEDIAN(A16:C16)-MIN(A16:C16))),MEDIAN(A16:C16),MAX(A16:C16)*((MAX(A16:C16)-MEDIAN(A16:C16))<(MEDIAN(A16:C16)-MIN(A16:C16))))    /   SUM((MAX(A16:C16)-MEDIAN(A16:C16))>(MEDIAN(A16:C16)-MIN(A16:C16)),(MAX(A16:C16)-MEDIAN(A16:C16))<(MEDIAN(A16:C16)-MIN(A16:C16)),1  )

    Most of the solutions posted above worked fine most of the time.

    Mpemba

  17. OK – checked out all the above formulae (except for the one by Ramesh I can’t access) and mine is the only one that gives the “correct” answer for all six cases.

    Most get 5 out of 6 but the major disagreement is about whether (1, 3, 5) should return 2 or 4

    coachexcel’s gets 5/6 but returns zero for (3,3,3)
    fzz’s also gets 5/6 but returns #DIV/0 for (3,3,3)

    Mpemba’s solution is the only one Mpemba tested that got 6 / 6 ;o)

  18. Mpemba is solving his own problem so it’s not surprising he gets the “top” marks.
    The problem as stated

    “I have data in three columns: A,B,C and I want to get the average of the closest two values out of three in each row. Could you help me with a formula for this?”

    does not comprehend the case where all 3 values are equal since it presupposes the existence of 2 values which are closest.

  19. Alf …

    Much as I was writing “tongue in cheek” … the serious error in your reply is that you miss the point where the three values are equally spaced …

    For example: if the columns happen to contain 1, 3 and 5 then what is the answer?
    Since the intervals are both “2? which pair are closes?

    The various other attempts at the formula have yielded either 2 or 4 as an answer.
    Mpemba’s solution arbitrarily plumps for 3 (the median).

    So OK, let’s assume the original post exclude the possiblity of equal spacing (or 3 equal values ever) then yes, any of the other formulae will indeed work.

    M

  20. Sure, Mpemba. I didn’t want to labour the point that the question itself was in error. Clearly the 1,3,5 set doesn’t meet the spec, either.
    Alf

  21. =IF(SUM(N(A1:C1=AVERAGE(A1:C1)))=1,AVERAGE(A1:C1),IF(SUM(N(A1:C1<AVERAGE(A1:C1)))=2,AVERAGE(IF(A1:C1AVERAGE(A1:C1)))=2,AVERAGE(IF(A1:C1>AVERAGE(A1:C1),A1:C1,””))))) array entered. A little long, but fits all cases.

  22. Yeah, I had skipped the all equal scenario, in which SKEW gives #DIV/0!.

    OK, try

    =QUARTILE(A1:C1,2-SIGN(SUM(A1:C1)/3-MEDIAN(A1:C1)))

    or if you want more clarity,

    =(MEDIAN(A1:C1)+LARGE(A1:C1,2+SIGN(SUM(A1:C1)/3-MEDIAN(A1:C1))))/2

    As others have already stated, the key point is that the median is always included in the average. The min or max should only be included when it's strictly closer to the median than the other. When min and max are equidistant from the median, the result should be the median = mean. Note: when there's a mode, the result should be the mode.

    @AlfD - ordinal comparisons have 3 states, not 2: less than, greater than and equal to. You can believe that the original question implied that median and mean would always differ, but experience fielding questions like this teaches that the person posing this question didn't appreciate there were 3 possible states. In and of itself, that's no big deal because there's hope that person could learn from the bugs that will eventually manifest from assuming there were only two states.

    OTOH, the earlier respondents who were silent about this but instead just dove into providing formulas were upholding the fine tradition of militant amateurism among spreadsheet 'experts'. Failure to recognize/point out exceptions and mention how to handle them is one of the many reasons real programmers don't take spreadsheet users seriously.

  23. Aside from the subjective criteria, Mpemba clearly didn’t check all formulas as the array formula i posted returns exactly the same values, is shorter and extends to any number of values. And in any case, it’s easy to adapt most exisiting formulas by using sign instead of . It’s also worth asking how solutions would be judged against other criteria such as readability, efficiency, generality or concision. I’m not sure that one would be so high ;)

  24. Wow !,

    I have a come up with a different approach to my previous post. This solution can handle situations like (3,3,3).

    Any hint on shortening formula would be appreciated.

    CHOOSE(MATCH(MIN(ABS(A1-B1),ABS(A1-C1),ABS(B1-C1)),CHOOSE({1,2,3},ABS(A1-B1),ABS(A1-C1),ABS(B1-C1)),0),A1+B1,A1+C1,B1+C1)/2
  25. lhm

    Darn it … I tested your formula (Well, the one that went)

    =QUARTILE(A103:C103,1+(3*MEDIAN(A103:C103)>SUM(A103:C103))*2)

    For some reason I overlooked the array one you refer to which does indeed give the “correct” results.

    Sorry about that — there were a lot to test and they didn’t all cut and paste smoothly.

    OK – Belatedly I’ll have to congratulate you. Yes – definitely the winner for brevity.

    Re the general point of these exercises.
    Other than for trivial questions I always prepare a table showing the “cases” I can think of and expected results. That way I can validate my formulae against them. (including in this case negative and mixed positive negative versions of the numbers I used).

    Mpemba

  26. Coachexcel

    Your new formula still returns 2 for the triplet {1,3,5).

    OK, that’s just as correct as 4, but I still would plump for 3 being the answer …

    M


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.