Pop quiz

Question One

You want to calculate a running (i.e. cumulative) total of the Data column.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)
=SUM(A2,B1)

Question Two

You want to calculate a running (i.e. cumulative) total of the Data column, and subtract 1 from it.
Which of these formulas should you put in B2 and drag down, and why?
=SUM(A$2:A2)-1
=SUM(A2,B1)-1

—edit—

If you’re ambivalent as to the approach you would take, download and try out a slightly revised thought experiment in the attached file. Then you won’t be ambivalent.

Pop-quiz_20140502-V6

This has a more realistic data set, where instead of subtracting 1 from the cumulative total, you want to subtract a varying list of cumulative expenses, so you can work out the cumulative net profit:
Pop Quiz v5

Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

But feel free to post alternatives.

33 Comments

  1. Ben says:

    I’d probably use the first calculation in both cases because it’s more resilient to rows being deleted. I’ve never stopped to think about it before, but I guess that the second query would be quicker to calculate, despite the long dependency tree, because Excel won’t normally need to recalculate the whole tree and summing two numbers has to be quicker than summing the column.

  2. Jon Acampora says:

    Very cool trick Jeff! I’m going with option #2, and I’m guessing it has something to do with calculation time.

    I just learned another awesome way to do this with structured reference formulas from Zack Barresse in this comment…

    http://www.excelcampus.com/tips/absolute-formula-references-excel-structured-table/#comment-3436

  3. Jon Acampora says:

    I do see one issue with #2, which might make me change my answer… If I delete a row in the table then the formulas below Ref out. Still interested to know the answer.

  4. Alan Taylor says:

    I’d go for #1 in both questions.

    Firstly, because it doesn’t break if you insert or add a row.

    In Question 2, the second formula will give the wrong answer – in the first cell you’ll get the correct answer, but the next row will give (running total less 2), then (running total less 3) and so on.

  5. Matth78 says:

    In second formula when substracting 1, will it not substract 1 multiple times ?

    I think (like Jonà formula #2 may cause problem on inserting and deleting row.
    It’s not optimal but I will tend to change it to = SUM(A2,OFFSET(B2,-1,0)) – 1

    Still, imho, the less error prone formula is the first using structured reference : =SUM(Table[[#header];[data]]:[@data])
    Since header in a table can’t be a number total will not be wrong. I think it’s simpler and better than using index to get first row. (I will post this on Zack Baresse article to see what he thinks !)

  6. snb says:

    In a table do don’t need to ‘drag down’; entering the formula in B2 suffices.

    I’d use in B2: =A2+B1

  7. Jeff Weir says:

    Jon: Deleting is not the issue.

    For Question one, try a really long list. Say put =row() in cells A1 to A20000 and then try both iterations.

    For Question two, just use a really short list, but try both approaches side by side.

    Then I imagine you’ll want to post back.

  8. Jeff Weir says:

    The point I’m driving at is far more apparent for Question One if after you’ve entered the formula down all 20,000 rows, you go back to B2 and double-click the fill handle again.

    And for question two, make sure you try out your suggested approach.

  9. Jeff Weir says:

    SNB: And what formula would you use for question 2? In B1 would you put = A1-1 and in B2 would you put = A2+B1-1 ? Try it on the sample data.

  10. Chris says:

    For Question 1 – the second option – =SUM(A2,B1) – provides much faster calculation than the first – I’m guessing using a ‘step by step’ calculation requires exponentially less processing power than calculating a ‘fresh’ SUM for each cell in the range.

    For Question 2 – Here you would have to revert to the first option – =SUM(A$2:A2)-1 – otherwise you’re subtracting one from each and every ‘cumulative total’ – so with 50 rows, you would be subtracting 50 from the final total. Alternatively, use the second option, but add the ‘-1′ to the first entry only, then use the =SUM(A2,B1) formula style from the 1st Question for the rest of the table.

  11. PeterB says:

    Jeff, your second formula is obviously going to be significantly faster over a larger range as the first formula will have to calculate more cells, increasing exponentially with each additional row. I had to use 35,000 rows to notice a marked difference and even then the first formula calculated in a few seconds.

    SNB’s formula is effectively the same as your option 2.

    For your question 2 (cumulative total -1), I used the following:
    – In cell B2: =Sum(A2,B1)-1
    – In cell B3 (then copied down): =sum(A3,B2)

  12. JLeno says:

    In a somewhat more general formula, I’d use:

    =SUM(A2,B1)-IF(B1=Table1[#Headers],1,0), which also works way more efficiently than the =SUM(A$2:A2)-1 formula

  13. Jeff Weir says:

    Peter, I too found that the first formula filled down in a matter of seconds. But then when I refilled it i.e. clicked the fill handle of the one at the top, it took a heck of a lot longer. What happens if you double click the fill handle after the initial populating?

    Yep, realise that SNB’s is the same. I just always hated having to have two different formulas…especially in an Excel Table given column formulas. So today I used my option 2. Then I thought to myself ‘Hey, why not subtract this other column from it at the same time, so I can see the net difference’. Then I spend hours scratching my head trying to work out why Option two broke as soon as you added in some more arithmetic.

  14. Jeff Weir says:

    My example of subtracting 1 is a bit trivial really. In the real world situation that this came up for me today, I had another column of wildly different amounts, and just thought ‘Hey, I can subtact those from my cumulative total to give me net cumulative’. I could not for the life of me work out what was going wrong, until hours later I whipped up the above simple example.

    Alan, Matth78, Chris, Peter, JLeno: Congrats. You are all smarter than I was earlier today for what it’s worth.

    I’ll amend my sample data pic in the main article to make it slightly less trivial.

    On avoiding the =SUM(A$2:A2) construct, I first found out about the effects from Charles William’s great article Improving Performance in Excel 2007

  15. Elias says:

    If inserting/deleting rows is not an issue
    B2=N(B1)+A2
    C2=N(C1)+A2-(ROWS(C$1:C1)=1)

    If it does
    B2=N(INDEX(B:B,ROWS(B$1:B1)))+A2
    C2=N(INDEX(C:C,ROWS(C$1:C1)))+A2-(ROWS(C$1:C1)=1)

    Regards

  16. Jeff Weir says:

    @All: All new and improved daft scenario and sample file now added to original post for your perverse pleasure. Take it that inserting/deleting rows is not an issue (assume the structure is locked down).

    Note that this is intended to be a thought experiment/illustration about a common approach which happens to be very resource intensive, and a better solution that works just fine provided you don’t do anything else within that formula but calculate a cumulative total.

    But feel free to post alternatives.

  17. PeterB says:

    I’m not sure just how realistic this scenario is. You have presented a single sales figure with a cumulative expense figure. If they were single expense figures then you could use sum(A2-E2+F1) and maintain the calculation efficiency.

    In the real world, if I was faced with this and the calculation time of using sum(a$2:a…) proved too resource intensive, I would use VBA.

  18. Elias says:

    =A2+N(C1)

    AND

    =A2+N(G1)-E2

    Work fine.

    Regards

  19. snb says:

    PeterB read my thoughts:

    Sub M_snb()
        sn = [A2:A40000]
        For j = 2 To UBound(sn)
          sn(j, 1) = sn(j - 1, 1) + sn(j, 1)
        Next
       
        [K2:K40000] = sn
    End Sub
  20. Matth78 says:

    Agreeing with PeterB, first formula (=SUM(A$2:A2)) will take longer since each formula requires an increasing power to calculate. Besides when a cell change in column A excel will trigger at once calculation for all below cells. (which is a lot)
    I think the second formula (=SUM(A2,B1)) will not quite do that. It will trigger calculation of running total on the same row which will trigger calculation for next row and so on. (And Excel should be less overwhelmed but I hope I’m not wrong here !)

    I think it’s similar to referencing full column which is considered bad practice.

  21. Jeff Weir says:

    Elias: =A2+N(G1)-E2 does not work fine. Try it side by side with =SUM(A2,F1)-E2 in the sample file. If you’re anything like me, then on the face of it, you’d think “Well that looks like it would work fine”. But when you implement it, you find out that it doesn’t return the right figure. Which has you scratching your head for a while, until you realise exactly why.

    PeterB: In the very real scenario I had yesterday where this reared it’s ugly head, I was actually trying to work out the cumulative sales from two countries – Canada and the US – and the Cumulative sales from the rest of the world. Sales data came from an OLAP cube via CUBEVALUE functions.

    Into Column A I was pulling the monthly sales of Canada and the US, and aggregating them on the fly via the efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForCanadaAndUS), A1). I wanted the cumulative sales of ‘The Whole World less Canada & US’ in Column B. But because of the large number of countries, I couldn’t get write a CUBEVALUE function that directly pulled everything but Canada and US to make up that ‘The Whole World less Canada & US’ directly…too many member expressions. So I thought “That’s okay, I’ll just pull the monthly data for ‘The Whole World’ into Column B, use my efficient =SUM(CUBEVALUE(GiveMeMonthlyFigureForTheWholeWorld), B1) formula to make it cumulative, then tack -A2 on the end to subtract the cumulative figures from Canada and US, and then I’m done”.

    And then I saw that my cumulative ‘The Whole World less Canada & US’ sales figure went negative. Which was impossible. I was convinced that I’d found a bug in Excel. Then I realised that it was due to the referencing of the previous cell in B1. But I didn’t realise this quickly. So I wondered how many others might inadvertently fall into this same trap. Just a couple of people, by the looks of it.

    Sure, it’s trivial to take another approach, e.g. simply use another column (Column C) where you subtract Column A’s ‘Canada and US’ from Column B’s ‘The Whole World’ to get Column C’s ‘The Whole World less Canada & US’.

    But for the life of me at the time, I was thinking “I don’t want to take another approach, because it’s your fault Excel, not mine. :-)

    PeterB & snb: Couldn’t use VBA, as this was for something that would be delivered to users via a SharePoint portal. Users don’t have permission to access the OLAP cube directly.

  22. Frans Bus says:

    I would prefer a structured reference equivalent of =SUM(A$2:A2):

    =SUM( OFFSET( tblSales[[#Headers],[Sales]], 1, 0):[@Sales])

    Or Matth78′s shorter solution:

    =SUM( tblSales[[#Headers],[Sales]]:[@Sales])

    Both perform just fine, within a second.
    Excel will store these formula’s only once for 20,000 cells as . Might explain the performance difference.

  23. Frans Bus says:

    Last sentence should have been:
    Excel will store these formula’s only once for 20,000 cells as calculatedColumnFormula. Might explain the performance difference.

  24. Elias says:

    Well Jeff, I tried my propose before to submit it and I just tried again. Guess what? The results are the same as yours.

    F2=SUM(A2,F1)-E2 and fill down to F10

    G2=A2+N(G1)-E2 and fill down to G10

    Results in both columns are
    7
    10
    11
    4
    -5
    -20
    -37
    -61
    -88

    Do I missing something?

    Saludos

  25. Jeff Weir says:

    Elias: My point is that both these two formulas return the wrong answer to the question:
    F2=SUM(A2,F1)-E2 and fill down to F10
    G2=A2+N(G1)-E2 and fill down to G10

    …and that out of the three, only the amended inefficient formula from Question One gets it right:
    G2=SUM($A$2:A2)-E2 and fill down to G10

    Will post a screenshot here in a moment.

  26. Jeff Weir says:

    Elias…sorry I realise I mentioned the wrong formula when I said ‘try them side by side’. I meant to say try it side by side with =SUM($A$2:A2)-E2 in the sample file.

    Example

  27. Jeff Weir says:

    Frans: Try inserting a new column in the workbook after you’ve done that (which I assume triggers a full recalc). Then try the same thing using just the =SUM(A2,B1) approach.
    Staggering difference.

    Or try deleting the column with that structured reference version in it, vs the one with the =SUM(A2,B1) approach.

    Entering these formulas generally takes no time at all. Doing some things to the workbook after they’re entered can be tedious.

  28. Elias says:

    Got it Jeff. What about this in F2 and copy down.

    =A2+N(F1)-E2+N(E1)

    Regard

  29. Elias says:

    Or this one.

    =A2+SUM(E1:F1)-E2

    Regards

  30. Jeff Weir says:

    Yeah, they’re great. Or dare I say it we could just use a helper column :-)

  31. snb says:

    Don’t think so:

    =SUM(A2;F1;E1)-E2

  32. Elias says:

    Or

    =SUM(A2,E1:F1,-E2)

    Regards

Posting code or formulas in your comment? Use <code> tags!

  • <code lang="vb">Block of code goes here</code>
  • <code lang="vb" inline="true">Inline code goes here</code>
  • <code>Formula goes here</code>

Leave a Reply

You must be logged in to post a comment.

Here's how to update your reports of company and nearly any web data: