Pop quiz

Question One

You want to calculate a running (i.e. cumulative) total of the Data column.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)
=SUM(A2,B1)

Question Two

You want to calculate a running (i.e. cumulative) total of the Data column, and subtract 1 from it.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)-1
=SUM(A2,B1)-1

—edit—

If you’re ambivalent as to the approach you would take, download and try out a slightly revised thought experiment in the attached file. Then you won’t be ambivalent.

Pop-quiz_20140502-V6

This has a more realistic data set, where instead of subtracting 1 from the cumulative total, you want to subtract a varying list of cumulative expenses, so you can work out the cumulative net profit:
Pop Quiz v5

Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

But feel free to post alternatives.

33 thoughts on “Pop quiz

  1. I’d probably use the first calculation in both cases because it’s more resilient to rows being deleted. I’ve never stopped to think about it before, but I guess that the second query would be quicker to calculate, despite the long dependency tree, because Excel won’t normally need to recalculate the whole tree and summing two numbers has to be quicker than summing the column.

  2. I do see one issue with #2, which might make me change my answer… If I delete a row in the table then the formulas below Ref out. Still interested to know the answer.

  3. I’d go for #1 in both questions.

    Firstly, because it doesn’t break if you insert or add a row.

    In Question 2, the second formula will give the wrong answer – in the first cell you’ll get the correct answer, but the next row will give (running total less 2), then (running total less 3) and so on.

  4. In second formula when substracting 1, will it not substract 1 multiple times ?

    I think (like Jonà formula #2 may cause problem on inserting and deleting row.
    It’s not optimal but I will tend to change it to = SUM(A2,OFFSET(B2,-1,0)) – 1

    Still, imho, the less error prone formula is the first using structured reference : =SUM(Table[[#header];[data]]:[@data])
    Since header in a table can’t be a number total will not be wrong. I think it’s simpler and better than using index to get first row. (I will post this on Zack Baresse article to see what he thinks !)

  5. In a table do don’t need to ‘drag down’; entering the formula in B2 suffices.

    I’d use in B2: =A2+B1

  6. Jon: Deleting is not the issue.

    For Question one, try a really long list. Say put =row() in cells A1 to A20000 and then try both iterations.

    For Question two, just use a really short list, but try both approaches side by side.

    Then I imagine you’ll want to post back.

  7. The point I’m driving at is far more apparent for Question One if after you’ve entered the formula down all 20,000 rows, you go back to B2 and double-click the fill handle again.

    And for question two, make sure you try out your suggested approach.

  8. SNB: And what formula would you use for question 2? In B1 would you put = A1-1 and in B2 would you put = A2+B1-1 ? Try it on the sample data.

  9. For Question 1 – the second option – =SUM(A2,B1) – provides much faster calculation than the first – I’m guessing using a ‘step by step’ calculation requires exponentially less processing power than calculating a ‘fresh’ SUM for each cell in the range.

    For Question 2 – Here you would have to revert to the first option – =SUM(A$2:A2)-1 – otherwise you’re subtracting one from each and every ‘cumulative total’ – so with 50 rows, you would be subtracting 50 from the final total. Alternatively, use the second option, but add the ‘-1’ to the first entry only, then use the =SUM(A2,B1) formula style from the 1st Question for the rest of the table.

  10. Jeff, your second formula is obviously going to be significantly faster over a larger range as the first formula will have to calculate more cells, increasing exponentially with each additional row. I had to use 35,000 rows to notice a marked difference and even then the first formula calculated in a few seconds.

    SNB’s formula is effectively the same as your option 2.

    For your question 2 (cumulative total -1), I used the following:
    – In cell B2: =Sum(A2,B1)-1
    – In cell B3 (then copied down): =sum(A3,B2)

  11. In a somewhat more general formula, I’d use:

    =SUM(A2,B1)-IF(B1=Table1[#Headers],1,0), which also works way more efficiently than the =SUM(A$2:A2)-1 formula

  12. Peter, I too found that the first formula filled down in a matter of seconds. But then when I refilled it i.e. clicked the fill handle of the one at the top, it took a heck of a lot longer. What happens if you double click the fill handle after the initial populating?

    Yep, realise that SNB’s is the same. I just always hated having to have two different formulas…especially in an Excel Table given column formulas. So today I used my option 2. Then I thought to myself ‘Hey, why not subtract this other column from it at the same time, so I can see the net difference’. Then I spend hours scratching my head trying to work out why Option two broke as soon as you added in some more arithmetic.

  13. My example of subtracting 1 is a bit trivial really. In the real world situation that this came up for me today, I had another column of wildly different amounts, and just thought ‘Hey, I can subtact those from my cumulative total to give me net cumulative’. I could not for the life of me work out what was going wrong, until hours later I whipped up the above simple example.

    Alan, Matth78, Chris, Peter, JLeno: Congrats. You are all smarter than I was earlier today for what it’s worth.

    I’ll amend my sample data pic in the main article to make it slightly less trivial.

    On avoiding the =SUM(A$2:A2) construct, I first found out about the effects from Charles William’s great article Improving Performance in Excel 2007

  14. If inserting/deleting rows is not an issue
    B2=N(B1)+A2
    C2=N(C1)+A2-(ROWS(C$1:C1)=1)

    If it does
    B2=N(INDEX(B:B,ROWS(B$1:B1)))+A2
    C2=N(INDEX(C:C,ROWS(C$1:C1)))+A2-(ROWS(C$1:C1)=1)

    Regards

  15. @All: All new and improved daft scenario and sample file now added to original post for your perverse pleasure. Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

    Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

    But feel free to post alternatives.

  16. I’m not sure just how realistic this scenario is. You have presented a single sales figure with a cumulative expense figure. If they were single expense figures then you could use sum(A2-E2+F1) and maintain the calculation efficiency.

    In the real world, if I was faced with this and the calculation time of using sum(a$2:a…) proved too resource intensive, I would use VBA.

  17. PeterB read my thoughts:

    Sub M_snb()
    sn = [A2:A40000]
    For j = 2 To UBound(sn)
    sn(j, 1) = sn(j - 1, 1) + sn(j, 1)
    Next

    [K2:K40000] = sn
    End Sub

  18. Agreeing with PeterB, first formula (=SUM(A$2:A2)) will take longer since each formula requires an increasing power to calculate. Besides when a cell change in column A excel will trigger at once calculation for all below cells. (which is a lot)
    I think the second formula (=SUM(A2,B1)) will not quite do that. It will trigger calculation of running total on the same row which will trigger calculation for next row and so on. (And Excel should be less overwhelmed but I hope I’m not wrong here !)

    I think it’s similar to referencing full column which is considered bad practice.

  19. Elias: =A2+N(G1)-E2 does not work fine. Try it side by side with =SUM(A2,F1)-E2 in the sample file. If you’re anything like me, then on the face of it, you’d think “Well that looks like it would work fine”. But when you implement it, you find out that it doesn’t return the right figure. Which has you scratching your head for a while, until you realise exactly why.

    PeterB: In the very real scenario I had yesterday where this reared it’s ugly head, I was actually trying to work out the cumulative sales from two countries – Canada and the US – and the Cumulative sales from the rest of the world. Sales data came from an OLAP cube via CUBEVALUE functions.

    Into Column A I was pulling the monthly sales of Canada and the US, and aggregating them on the fly via the efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForCanadaAndUS), A1). I wanted the cumulative sales of ‘The Whole World less Canada & US’ in Column B. But because of the large number of countries, I couldn’t get write a CUBEVALUE function that directly pulled everything but Canada and US to make up that ‘The Whole World less Canada & US’ directly…too many member expressions. So I thought “That’s okay, I’ll just pull the monthly data for ‘The Whole World’ into Column B, use my efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForTheWholeWorld), B1) formula to make it cumulative, then tack -A2 on the end to subtract the cumulative figures from Canada and US, and then I’m done”.

    And then I saw that my cumulative ‘The Whole World less Canada & US’ sales figure went negative. Which was impossible. I was convinced that I’d found a bug in Excel. Then I realised that it was due to the referencing of the previous cell in B1. But I didn’t realise this quickly. So I wondered how many others might inadvertently fall into this same trap. Just a couple of people, by the looks of it.

    Sure, it’s trivial to take another approach, e.g. simply use another column (Column C) where you subtract Column A’s ‘Canada and US’ from Column B’s ‘The Whole World’ to get Column C’s ‘The Whole World less Canada & US’.

    But for the life of me at the time, I was thinking “I don’t want to take another approach, because it’s your fault Excel, not mine. :-)

    PeterB & snb: Couldn’t use VBA, as this was for something that would be delivered to users via a SharePoint portal. Users don’t have permission to access the OLAP cube directly.

  20. I would prefer a structured reference equivalent of =SUM(A$2:A2):

    =SUM( OFFSET( tblSales[[#Headers],[Sales]], 1, 0):[@Sales])

    Or Matth78’s shorter solution:

    =SUM( tblSales[[#Headers],[Sales]]:[@Sales])

    Both perform just fine, within a second.
    Excel will store these formula’s only once for 20,000 cells as . Might explain the performance difference.

  21. Last sentence should have been:
    Excel will store these formula’s only once for 20,000 cells as calculatedColumnFormula. Might explain the performance difference.

  22. Well Jeff, I tried my propose before to submit it and I just tried again. Guess what? The results are the same as yours.

    F2=SUM(A2,F1)-E2 and fill down to F10

    G2=A2+N(G1)-E2 and fill down to G10

    Results in both columns are
    7
    10
    11
    4
    -5
    -20
    -37
    -61
    -88

    Do I missing something?

    Saludos

  23. Elias: My point is that both these two formulas return the wrong answer to the question:
    F2=SUM(A2,F1)-E2 and fill down to F10
    G2=A2+N(G1)-E2 and fill down to G10

    …and that out of the three, only the amended inefficient formula from Question One gets it right:
    G2=SUM($A$2:A2)-E2 and fill down to G10

    Will post a screenshot here in a moment.

  24. Elias…sorry I realise I mentioned the wrong formula when I said ‘try them side by side’. I meant to say try it side by side with =SUM($A$2:A2)-E2 in the sample file.

    Example

  25. Frans: Try inserting a new column in the workbook after you’ve done that (which I assume triggers a full recalc). Then try the same thing using just the =SUM(A2,B1) approach.
    Staggering difference.

    Or try deleting the column with that structured reference version in it, vs the one with the =SUM(A2,B1) approach.

    Entering these formulas generally takes no time at all. Doing some things to the workbook after they’re entered can be tedious.

  26. Got it Jeff. What about this in F2 and copy down.

    =A2+N(F1)-E2+N(E1)

    Regard


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.