Pop quiz

Question One

You want to calculate a running (i.e. cumulative) total of the Data column.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)
=SUM(A2,B1)

Question Two

You want to calculate a running (i.e. cumulative) total of the Data column, and subtract 1 from it.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)-1
=SUM(A2,B1)-1

—edit—

If you’re ambivalent as to the approach you would take, download and try out a slightly revised thought experiment in the attached file. Then you won’t be ambivalent.

Pop-quiz_20140502-V6

This has a more realistic data set, where instead of subtracting 1 from the cumulative total, you want to subtract a varying list of cumulative expenses, so you can work out the cumulative net profit:
Pop Quiz v5

Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

But feel free to post alternatives.

33 thoughts on “Pop quiz

  1. Ben

    I’d probably use the first calculation in both cases because it’s more resilient to rows being deleted. I’ve never stopped to think about it before, but I guess that the second query would be quicker to calculate, despite the long dependency tree, because Excel won’t normally need to recalculate the whole tree and summing two numbers has to be quicker than summing the column.

  2. Jon Acampora

    I do see one issue with #2, which might make me change my answer… If I delete a row in the table then the formulas below Ref out. Still interested to know the answer.

  3. Alan Taylor

    I’d go for #1 in both questions.

    Firstly, because it doesn’t break if you insert or add a row.

    In Question 2, the second formula will give the wrong answer – in the first cell you’ll get the correct answer, but the next row will give (running total less 2), then (running total less 3) and so on.

  4. Matth78

    In second formula when substracting 1, will it not substract 1 multiple times ?

    I think (like Jonà formula #2 may cause problem on inserting and deleting row.
    It’s not optimal but I will tend to change it to = SUM(A2,OFFSET(B2,-1,0)) – 1

    Still, imho, the less error prone formula is the first using structured reference : =SUM(Table[[#header];[data]]:[@data])
    Since header in a table can’t be a number total will not be wrong. I think it’s simpler and better than using index to get first row. (I will post this on Zack Baresse article to see what he thinks !)

  5. snb

    In a table do don’t need to ‘drag down’; entering the formula in B2 suffices.

    I’d use in B2: =A2+B1

  6. Jeff Weir Post author

    Jon: Deleting is not the issue.

    For Question one, try a really long list. Say put =row() in cells A1 to A20000 and then try both iterations.

    For Question two, just use a really short list, but try both approaches side by side.

    Then I imagine you’ll want to post back.

  7. Jeff Weir Post author

    The point I’m driving at is far more apparent for Question One if after you’ve entered the formula down all 20,000 rows, you go back to B2 and double-click the fill handle again.

    And for question two, make sure you try out your suggested approach.

  8. Jeff Weir Post author

    SNB: And what formula would you use for question 2? In B1 would you put = A1-1 and in B2 would you put = A2+B1-1 ? Try it on the sample data.

  9. Chris

    For Question 1 – the second option – =SUM(A2,B1) – provides much faster calculation than the first – I’m guessing using a ‘step by step’ calculation requires exponentially less processing power than calculating a ‘fresh’ SUM for each cell in the range.

    For Question 2 – Here you would have to revert to the first option – =SUM(A$2:A2)-1 – otherwise you’re subtracting one from each and every ‘cumulative total’ – so with 50 rows, you would be subtracting 50 from the final total. Alternatively, use the second option, but add the ‘-1’ to the first entry only, then use the =SUM(A2,B1) formula style from the 1st Question for the rest of the table.

  10. PeterB

    Jeff, your second formula is obviously going to be significantly faster over a larger range as the first formula will have to calculate more cells, increasing exponentially with each additional row. I had to use 35,000 rows to notice a marked difference and even then the first formula calculated in a few seconds.

    SNB’s formula is effectively the same as your option 2.

    For your question 2 (cumulative total -1), I used the following:
    – In cell B2: =Sum(A2,B1)-1
    – In cell B3 (then copied down): =sum(A3,B2)

  11. JLeno

    In a somewhat more general formula, I’d use:

    =SUM(A2,B1)-IF(B1=Table1[#Headers],1,0), which also works way more efficiently than the =SUM(A$2:A2)-1 formula

  12. Jeff Weir Post author

    Peter, I too found that the first formula filled down in a matter of seconds. But then when I refilled it i.e. clicked the fill handle of the one at the top, it took a heck of a lot longer. What happens if you double click the fill handle after the initial populating?

    Yep, realise that SNB’s is the same. I just always hated having to have two different formulas…especially in an Excel Table given column formulas. So today I used my option 2. Then I thought to myself ‘Hey, why not subtract this other column from it at the same time, so I can see the net difference’. Then I spend hours scratching my head trying to work out why Option two broke as soon as you added in some more arithmetic.

  13. Jeff Weir Post author

    My example of subtracting 1 is a bit trivial really. In the real world situation that this came up for me today, I had another column of wildly different amounts, and just thought ‘Hey, I can subtact those from my cumulative total to give me net cumulative’. I could not for the life of me work out what was going wrong, until hours later I whipped up the above simple example.

    Alan, Matth78, Chris, Peter, JLeno: Congrats. You are all smarter than I was earlier today for what it’s worth.

    I’ll amend my sample data pic in the main article to make it slightly less trivial.

    On avoiding the =SUM(A$2:A2) construct, I first found out about the effects from Charles William’s great article Improving Performance in Excel 2007

  14. Elias

    If inserting/deleting rows is not an issue
    B2=N(B1)+A2
    C2=N(C1)+A2-(ROWS(C$1:C1)=1)

    If it does
    B2=N(INDEX(B:B,ROWS(B$1:B1)))+A2
    C2=N(INDEX(C:C,ROWS(C$1:C1)))+A2-(ROWS(C$1:C1)=1)

    Regards

  15. Jeff Weir Post author

    @All: All new and improved daft scenario and sample file now added to original post for your perverse pleasure. Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

    Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

    But feel free to post alternatives.

  16. PeterB

    I’m not sure just how realistic this scenario is. You have presented a single sales figure with a cumulative expense figure. If they were single expense figures then you could use sum(A2-E2+F1) and maintain the calculation efficiency.

    In the real world, if I was faced with this and the calculation time of using sum(a$2:a…) proved too resource intensive, I would use VBA.

  17. snb

    PeterB read my thoughts:

    Sub M_snb()
        sn = [A2:A40000]
        For j = 2 To UBound(sn)
          sn(j, 1) = sn(j - 1, 1) + sn(j, 1)
        Next
       
        [K2:K40000] = sn
    End Sub
  18. Matth78

    Agreeing with PeterB, first formula (=SUM(A$2:A2)) will take longer since each formula requires an increasing power to calculate. Besides when a cell change in column A excel will trigger at once calculation for all below cells. (which is a lot)
    I think the second formula (=SUM(A2,B1)) will not quite do that. It will trigger calculation of running total on the same row which will trigger calculation for next row and so on. (And Excel should be less overwhelmed but I hope I’m not wrong here !)

    I think it’s similar to referencing full column which is considered bad practice.

  19. Jeff Weir Post author

    Elias: =A2+N(G1)-E2 does not work fine. Try it side by side with =SUM(A2,F1)-E2 in the sample file. If you’re anything like me, then on the face of it, you’d think “Well that looks like it would work fine”. But when you implement it, you find out that it doesn’t return the right figure. Which has you scratching your head for a while, until you realise exactly why.

    PeterB: In the very real scenario I had yesterday where this reared it’s ugly head, I was actually trying to work out the cumulative sales from two countries – Canada and the US – and the Cumulative sales from the rest of the world. Sales data came from an OLAP cube via CUBEVALUE functions.

    Into Column A I was pulling the monthly sales of Canada and the US, and aggregating them on the fly via the efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForCanadaAndUS), A1). I wanted the cumulative sales of ‘The Whole World less Canada & US’ in Column B. But because of the large number of countries, I couldn’t get write a CUBEVALUE function that directly pulled everything but Canada and US to make up that ‘The Whole World less Canada & US’ directly…too many member expressions. So I thought “That’s okay, I’ll just pull the monthly data for ‘The Whole World’ into Column B, use my efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForTheWholeWorld), B1) formula to make it cumulative, then tack -A2 on the end to subtract the cumulative figures from Canada and US, and then I’m done”.

    And then I saw that my cumulative ‘The Whole World less Canada & US’ sales figure went negative. Which was impossible. I was convinced that I’d found a bug in Excel. Then I realised that it was due to the referencing of the previous cell in B1. But I didn’t realise this quickly. So I wondered how many others might inadvertently fall into this same trap. Just a couple of people, by the looks of it.

    Sure, it’s trivial to take another approach, e.g. simply use another column (Column C) where you subtract Column A’s ‘Canada and US’ from Column B’s ‘The Whole World’ to get Column C’s ‘The Whole World less Canada & US’.

    But for the life of me at the time, I was thinking “I don’t want to take another approach, because it’s your fault Excel, not mine. :-)

    PeterB & snb: Couldn’t use VBA, as this was for something that would be delivered to users via a SharePoint portal. Users don’t have permission to access the OLAP cube directly.

  20. Frans Bus

    I would prefer a structured reference equivalent of =SUM(A$2:A2):

    =SUM( OFFSET( tblSales[[#Headers],[Sales]], 1, 0):[@Sales])

    Or Matth78’s shorter solution:

    =SUM( tblSales[[#Headers],[Sales]]:[@Sales])

    Both perform just fine, within a second.
    Excel will store these formula’s only once for 20,000 cells as . Might explain the performance difference.

  21. Frans Bus

    Last sentence should have been:
    Excel will store these formula’s only once for 20,000 cells as calculatedColumnFormula. Might explain the performance difference.

  22. Elias

    Well Jeff, I tried my propose before to submit it and I just tried again. Guess what? The results are the same as yours.

    F2=SUM(A2,F1)-E2 and fill down to F10

    G2=A2+N(G1)-E2 and fill down to G10

    Results in both columns are
    7
    10
    11
    4
    -5
    -20
    -37
    -61
    -88

    Do I missing something?

    Saludos

  23. Jeff Weir Post author

    Elias: My point is that both these two formulas return the wrong answer to the question:
    F2=SUM(A2,F1)-E2 and fill down to F10
    G2=A2+N(G1)-E2 and fill down to G10

    …and that out of the three, only the amended inefficient formula from Question One gets it right:
    G2=SUM($A$2:A2)-E2 and fill down to G10

    Will post a screenshot here in a moment.

  24. Jeff Weir Post author

    Elias…sorry I realise I mentioned the wrong formula when I said ‘try them side by side’. I meant to say try it side by side with =SUM($A$2:A2)-E2 in the sample file.

    Example

  25. Jeff Weir Post author

    Frans: Try inserting a new column in the workbook after you’ve done that (which I assume triggers a full recalc). Then try the same thing using just the =SUM(A2,B1) approach.
    Staggering difference.

    Or try deleting the column with that structured reference version in it, vs the one with the =SUM(A2,B1) approach.

    Entering these formulas generally takes no time at all. Doing some things to the workbook after they’re entered can be tedious.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax