UnPivot Shootout

Jeff here, again. PivotTables again. Sorry ’bout that.

snb posted a very concise bit of code to unwind crosstabs over at Unpivot by SQL and so I got to wondering how my much longer routine handled in comparison.

My approach used SQL and lots of Union All statements to do the trick. And lots and lots of code. Whereas snb uses arrays to unwind the crosstab, which is fine so long as you don’t run out of worksheet to post the resulting flat-file in. Which is going to be the case 99.999999% of the time. And frankly, crosstabs in the other 0.000001% of cases deserve to be stuck as crosstabs.

At the same time, I thought I’d also test a previous approach of mine that uses the Multiple Consolidation trick that Mike Alexander outlines at Transposing a Dataset with a PivotTable. This approach:

  1. copies the specific contiguous or non-contiguous columns of data that the user want to turn into a flat file to a new sheet.
  2. concatenates all the columns on the left into one column, while putting the pipe character ‘|’ between each field so that later we can split these apart into separate columns again.
  3. creates a pivot table out of this using Excel’s ‘Multiple Consolidation Ranges’ option. Normally this type of pivot table is used for combining data on different sheets, but it has the side benefit of taking horizontal data and providing a vertical extract once you double click on the Grand Total field. This is also known as a ‘Reverse Pivot’.
  4. splits our pipe-delimited column back into seperate columns, using Excel’s Text-to-Column funcionality.

snb’s approach

snbs’ code for a dataset with two non-pivot fields down the left looked like this:

…which I’m sure you’ll all agree falls somewhere on the spectrum between good looking and positivity anorexic. So I put a bit of meat on it’s bones so that it prompts you for ranges and handles any sized cross-tab:

Talk about yo-yo dieting!

Multiple Consolidation Trick approach

And here’s my code that uses the Multiple Consolidation trick:

The SQL appoach is the same as I published here.

And the winner is…

…snb. By a long shot. With the ever-so-slight caveat that you’re crosstabs are not so stupidly fat that the resulting flat file exceeds the number of rows in Excel.

Here’s how things stacked up on a 53 Column x 2146 Row crosstab, which gives a 117,738 row flat-file:

Approach Time (M:SS)
snb 0:01
UnPivotByConsolidation 0:04
UnPivotBySQL 0:14

And here’s how things stacked up on a 53 Columns x 19,780 Row crosstab, giving a 1,048,340 row flat-file (i.e. practically the biggest sized crosstab that you can unwind):

Approach Time (M:SS)
snb 0:19
UnPivotByConsolidation 0:42
UnPivotBySQL 2:17

So there you have it. Use snb’s code. Unless you have no choice but to use my longer, slower SQL approach.

Update 26 November 2013
It was remiss of me not to mention the Data Normalizer routine in Doug Glancy’s great yoursumbuddy blog, which is just about as fast as snb’s approach below. Go check it out, and subscribe to Doug’s blog while you’re there if you haven’t already.

If you don’t want the hassle of working out which to use, here’s a routine that uses snb’s if possible, and otherwise uses mine:

31 thoughts on “UnPivot Shootout

  1. Sub TransposeConsolidated()

    'Step 1: Declare your Variables
    Dim SourceRange As Range
    Dim GrandRowRange As Range
    Dim GrandColumnRange As Range
    Dim TempSheetName As String

    'Step 2: Define your data source range
    Set SourceRange = Sheets("Sheet1").Range("A4:M87")

    'Step 3: Build Multiple Consolidation Range Pivot Table
    ActiveWorkbook.PivotCaches.Create(SourceType:=xlConsolidation, _
    SourceData:=SourceRange.Address(ReferenceStyle:=xlR1C1), _
    Version:=xlPivotTableVersion14).CreatePivotTable _
    TableDestination:="", _
    TableName:="TempPvt2", _
    DefaultVersion:=xlPivotTableVersion14

    'Step 4: Find the Column and Row Grand Totals
    TempSheetName = ActiveSheet.Name
    ActiveSheet.PivotTables(1).PivotSelect "'Row Grand Total'"
    Set GrandRowRange = Range(Selection.Address)

    ActiveSheet.PivotTables(1).PivotSelect "'Column Grand Total'", xlDataAndLabel, True
    Set GrandColumnRange = Range(Selection.Address)

    'Step 5: Drill into the intersection of Row and Column
    Intersect(GrandRowRange, GrandColumnRange).ShowDetail = True

    'Step 6: Delete temp sheet
    Application.DisplayAlerts = False
    Sheets(TempSheetName).Delete
    Application.DisplayAlerts = True

    End Sub

  2. Most impressive, and useful as well (which of course doesn’t always follow). Is it possible to retain field types, e.g., not have categorical values changed to numbers? For example, retain the text “022” instead of it becoming the number 22.

  3. Jeff:

    Oops I didn’t explain the code I posted. (DK I used the [vb] and [/vb] tags, but it didn’t get formatted. Can you fix?)

    Firstly, good post. You’re quite the prolific blogger these days. What’s your secret? Adderall and Mountain Dew? Fear and Adrenaline? Are you in a quest to bed sexy-sexy Excel groupies?

    Anyway, the code I posted is nothing more than a simplified method to Transpose with Consolidated Pivot tables.

    I’m sure it’s not as fast as the snb method, and it’s definitely not as comprehensive as your code. But it’s easier to read and works for most purposes.

  4. Hi Mike. I’m prolific because I don’t have a life. If I don’t pull back on the blogging, then I won’t have a wife, either, judging by the muttering going on around here. But on the upside I’ll be able to get more blogging done, as the dirt and unwashed cutlery piles up around me and I no longer get prompted to take time out and do inconsequential wife-pleasing stuff like vacuuming and showering.

    Yeah, my code is quite long on account of being weaponized so that it is point and shoot (and needs to be, because I’m thinking of an addin for all my pivot improvements). Plus it handles multiple ‘dimension’ columns. Plus I erred on the side of caution in explaining stuff, so that I’m not scratching my head in 2 years time, saying “What is this bit supposed to do?” Rather, I can say “Why the hell did I do it that way? Two years ago, I was a complete hack”. In fact, I did write this consolidation routine about two years ago, and it probably needs a re-tune in light of what I know now.

    Thanks for bringing the PivotSelect method to my attention. Never knew that existed. Very handy. Cheers for the comment. I was considering signing in as someone else and leaving a comment, just to egg myself on. Now I don’t have to. :-)

  5. Jeff, Here’s some code that I posted on my blog in a “Data Normalizer post” and which is one of my more successful SO answers (http://stackoverflow.com/a/10922351/293078). On my laptop, with the size ranges you mention, it times about the same, about one second for the smaller range and 12 seconds for the xlsx-filling data set:

    You’d call it like this:

  6. I made a more flexible approach:
    – assuming a table which can be referred to
    – a parameter with which you can indicate how many columns have to be ‘fixed’

    Sub M_snb(sn, st, x As Integer)
    ReDim sp(1 To (UBound(sn, 2) - x) * UBound(sn), 1 To x + 2)

    For j = 1 To UBound(sp)
    m = (j - 1) Mod UBound(sn) + 1
    n = (j - 1) \ UBound(sn) + x + 1
    For jj = 1 To x
    sp(j, jj) = sn(m, jj)
    Next
    sp(j, jj) = st(1, n)
    sp(j, jj + 1) = sn(m, n)
    Next

    Cells(20, 1).Resize(UBound(sp), UBound(sp, 2)) = sp
    End Sub

    calling the macro using:
    Sub M_call_snb()
    M_snb ListObjects(1).DataBodyRange.Value, ListObjects(1).HeaderRowRange.Value, 2
    End Sub

  7. Hi Doug. I forgot about your previous post. Will add a link to it above. On my system, to return a flat file of 1,044,048 records your code took 22 seconds against snb’s 19 seconds. So not a lot in it. My SQL approach took 2:38 but as per my last code block above, my SQL approach would only ever get called if there were going to be more than 1,048,576 records (or 65538 records for earlier versions of Excel).

    I’m going to re-write my SQL approach, so instead of it doing a whole bunch of unioned UNION ALL’s it uses SNB’s approach with the revision that if there’s more data than the sheet will handle, it will break up the data into several temporary sheets, then join those with a couple of UNION ALL’s. Should bring the time down from minutes to seconds.

  8. Hi Jeff. Thanks for your great and inspiring posts. And so are the comments. Here’s my contribution.
    To normalize I often use the “Moving Ranges” technique, utilizing properties and methods of the listobject. Basically it does a block transfer directly from source to target table. No intermediate storage is involved, which makes it fast. It did 10 to 15 seconds on your biggest crosstab on my machines.

    Hope this helps.
    – Frans

  9. I don’t think it’s faster (although on my system it is); just to illustrate another technique

    assume
    – a table: 16 rows, 9 columns
    – something in cell A19 to demarcate from where the results should be written.

    Sub tst_002()
    M_snb_003 ListObjects(1), 2
    End Sub

    Sub M_snb_003(c00, x)
    With c00.DataBodyRange
    .Columns(1).Name = "c_1"
    .Columns(2).Name = "c_2"
    c00.HeaderRowRange.Offset(, x).Name = "h_1"
    .Offset(, x).Resize(, .Columns.Count - x).Name = "d_2"
    End With

    sn = [index(c_1 & "|" & c_2 & "|" & h_1 & "|" & d_2,)]
    For j = 1 To UBound(sn, 2) - x
    Cells(Rows.Count, 1).End(xlUp).Offset(1).Resize(UBound(sn)) = Application.Index(sn, 0, j)
    Next
    Cells(20, 1).CurrentRegion.TextToColumns , , , , 0, 0, 0, 0, 1, "|"
    End Sub

  10. @Jeff,

    Sorry, I wasn’t able to resist the queeste for a oneliner. Alas it ended up in a twoliner.

    -assuming 1 table (listobject) in the worksheet; it’s name “Table1” (default); it’s size e.g. A1:I16

    then this is all you need:

    Sub M_start()
    M_snb 2
    End Sub

    Sub M_snb(x)
    Cells(1).Resize([table1].Rows.Count * ([table1].Columns.Count - x), 4).Name = "d_1"
    [d_1].Offset(20) = Application.Index([table1[#All]], [if(column(d_1)=columns(d_1)-1,1,mod(row(d_1)-1,rows(table1))+2)], [if(column(d_2)< (columns(d_2)-2),column(d_1),int((row(d_1)-1)/rows(table1))+3)]) End Sub

    NB. references to not active worksheets can be added simply.

  11. @sam

    No I shouldn’t provided I restored the typos:

    Sub M_snb(x)
    Cells(1).Resize([table1].Rows.Count * ([table1].Columns.Count - x), 4).Name = "d_1"
    [d_1].Offset(20) = Application.Index([table1[#All]], [if(column(d_1)=columns(d_1)-1,1,mod(row(d_1)-1,rows(table1))+2)], [if(column(d_1)< (columns(d_1)-2),column(d_1),int((row(d_1)-1)/rows(table1))+3)]) End Sub

    Thank you for pointing that out to me.!

  12. @snb Shouldn’t the resize statement use something like x+2 for the column width, being number of fixed columns (x) plus category + value columns?

  13. I got a question about this from a reader of my blog, and Google sent me right here. SNB’s approach broke on the very first try, and Dick’s looked awfully long and convoluted. Also, I thought, half the time there are multiple header rows as well as header columns. So I put together this little function that inputs the crosstab data range, number of label rows at the top, and number of label columns at the left, and outputs an array with one header row, then each record contains each row and column label for each given value. You need to call it from another routine that will put the output where you want it.

    I have already written code that pops up a dialog, which gets the user’s range, numbers of rows and columns, where to output the data (a range on the same sheet, or a new sheet), and whether to link the output data to the original crostab.

    I didn’t bother timing it, since my data is usually small and simple enough that I do it by hand, and this is way faster than that. I also haven’t tested the holy hell out of it, just run through a few simple crosstabs. But I like it, so I’m going to put it into a future update of my Chart Utilities.

  14. SMB –

    I used a simple 5×5 range with four row headers and four column headers:

    FRUIT JAN APRIL AUGUST DEC
    APPLE 10 22 44 55
    ORANGE 13 7 4 18
    BANANA 36 42 56 18
    PEAR 11 15 5 27

    I used to first SNB routine, from the top of the article, without wading through the editions in the comments.

    The code ran without crashing, but the output was not right:

    APPLE 10 FRUIT APPLE
    ORANGE 13 FRUIT ORANGE
    BANANA 36 FRUIT BANANA
    PEAR 11 FRUIT PEAR
    APPLE 10 JAN 10
    ORANGE 13 JAN 13
    BANANA 36 JAN 36
    PEAR 11 JAN 11
    APPLE 10 APRIL 22
    ORANGE 13 APRIL 7
    BANANA 36 APRIL 42
    PEAR 11 APRIL 15
    APPLE 10 AUGUST 44
    ORANGE 13 AUGUST 4
    BANANA 36 AUGUST 56
    PEAR 11 AUGUST 5
    APPLE 10 DEC 55
    ORANGE 13 DEC 18
    BANANA 36 DEC 18
    PEAR 11 DEC 27

    If I delete the first four rows and the second column, and add headers, it’s fine. Of course, that’s because your initial routine was hard-coded for a specific input and output layout. As I said, it wasn’t flexible in terms of where’s the input, where’s the output, and how many header rows and columns are there.

    This image shows the data I used, the output from the SNB routine I tried, the output from my routine, and the dialog I’m wrapping around it:
    http://peltiertech.com/images/2015-02/DailyDoseUnpivotDataSMBDialog.pngPivot

    It offers linking of the output to the original data, because you might decide that 7 oranges for April was 8, and it’s easier to find and replace in the crosstab table. Also it offers the option to turn the output range into a Table, because Tables are the bomb.

    I’m working on an algorithm to autofill the entries for Rowes of Labels and Columns of Labels with something smarter than ones.

    This is cool. Rarely does it take less than a day to build a whole new function for my utility.

  15. The inputs and outputs in my comment were nicer, but WordPress turned the tabs into spaces, so they came out ugly. Click on the link to the image, and you’ll see it.

  16. @Jon. Dick’s looked awfully long and convoluted. No Dick’s code doesn’t. Because it’s my awfully long and convoluted code, not his. :-)

    My approach is actually several approaches in one, and a lot of its length is due to end cases that would otherwise not produce a PivotTable. I did a separate blog on it at http://dailydoseofexcel.com/archives/2013/11/19/unpivot-via-sql/ that discussed the approach. For instance, if unwinding a very large crosstab would result in a flat file that exceeds the amount of rows in Excel, it uses some dynamic SQL to create lots and lots of Union All statements, and then writes the result directly to a PivotTable. Otherwise it uses a second approach of straight out array manipulation. And it does a whole lot of checking around number formats and some other things that would otherwise cause an error in some circumstances. So it’s long, but it’s robust to quite a few things that your shorter code won’t handle.

  17. @Jon

    You might have noticed that the structure of your table is quite different form the one Jeff used and on which my suggestion was based.
    If you want to convert a matrix with rowlabels and columnlabels with the same appraoch you can use:

  18. Jonathan: That’s a good requirement. One of the routines above has an option to skip blanks, but I never thought about also letting it skip zeros. I’ll rewrite the routine so that you can feed it an array of things to skip. Watch this space…

  19. Hello Jeff,

    Hoping you will pick up on this comment from your 2013 post! Thanks for the fantastic post, I’m utilizing your code for the “Multiple Consolidation Trick approach” and it’s working great. I was wondering if you’ll be willing to share a version of your code that is able to handle multiple header ROWS as well as columns? Something like this:

    Country Sector 1990 1990 … 2009
    Jan Feb … Jan
    ==============================================
    Australia Energy 290,872 296,887 … 417,355
    New Zealand Energy 23,915 25,738 … 31,361
    United States Energy 5,254,607 5,357,253 … 5,751,106
    Australia Manufacturing 35,648 35,207 … 44,514
    New Zealand Manufacturing 4,389 4,845 … 4,907
    United States Manufacturing 852,424 837,828 … 735,902
    Australia Transport 62,121 61,504 … 83,645
    New Zealand Transport 8,679 8,696 … 13,783
    United States Transport 1,484,909 1,447,234 … 1,722,501

    Into this:

    ‘ Country Sector Year1 Year2 Value
    ‘ ====================================================
    ‘ Australia Energy 1990 Jan 290,872
    ‘ New Zealand Energy 1990 Jan 23,915
    ‘ United States Energy 1990 Jan 5,254,607
    ‘ Australia Manufacturing 1990 Jan 35,648
    ….
    ‘ Australia Energy 1990 Feb 296,887
    ‘ New Zealand Energy 1990 Feb 25,738
    ‘ United States Energy 1990 Feb 5,357,253
    ‘ Australia Manufacturing 1990 Feb 35,207
    ….

    I know it shouldn’t be too hard because it’s just doing the same “consolidate then split using text-to-column” approach that you did for the column headers, but I’m not very proficient with VBA so figured I should ask the guru(s).

  20. Hi William. These days PowerQuery (built in to Excel 2016 and later and available as a free addin for 2010 or 2013) is the best tool for that particular job. Google PowerQuery and unpivot multiple columns and you should turn up heaps of tutorials. Look out for anything from Ken Puls on this subject. In fact, buy his book and consider his training…you won’t find a better guide!


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published. Required fields are marked *