UnPivot via SQL

Howdy folks. Jeff Pivot…err…Weir here again.

Recently Ken Puls did a handy post on how to unpivot data using PowerQuery. Jan Karel commented that you can do this using Multiple Consolidation Ranges. That’s true, but what I like about the PowerQuery approach is that you can translate the currently selected columns into attribute-value pairs, combined with the rest of the values in each row. That is, you can have multiple hierarchical columns down the left of your CrossTab as well as the column headers across the top that you want to amalgamate. Which is great if you have a crosstab like this:

CrossTab

Whereas the Multiple Consolidation trick only handles one column down the left out of the box.

Mike Alexander posted a great bacon-tasting morsel of a trick to get around that issue way back in 2009 when he used to blog. He simply concatenating all the non-column-oriented fields into one dimension field into one temporary column. Check out his post Transposing a Dataset with a PivotTable. But as commenter dermotb said…it’s like a magic spell that you have to write down somewhere, and try to find when you need it, because it’s complex. (I love Mike’s reply to that: Come on. Excel is full of magic syntax, mystical hot keys, and vba voodoo that requires some level of memorizing steps. That’s why I can make a living peddling “tips and tricks”.)

Another problem with the Multiple Consolidation trick is that you might well end up with more data than fits in your sheet, by the time you flatten it out. Especially in old Excel. Because the number of rows you end up with in a flat file is the number of rows you start off with times the number of columns that you’re going to amalgamate. So for say a time-series dataset that covers quite a few items and a reasonable period of time, you could be in trouble.

So a while ago I had a crack at writing a SQL routine that unpivots by doing lots of UNION ALL joins, and writes the data directly to a PivotTable. The UNION ALLs are required because the pidgin English version of SQL that Excel speaks (and Access too, I guess) doesn’t have a UNPIVOT command.

I struck a few hurdles along the way. For instance, it turns out that the Microsoft JET/ACE Database engine has a hard limit of 50 ‘UNION ALL’ clauses, which you will soon exceed if you have a big crosstab with multiple columns down the left. I found a great thread over at MrExcel at which Fazza overcame this hard limit by creating sub-blocks of UNION ALL statements, then stiching them all together with another UNION ALL. Another problem is that SQL didn’t like dates (and sometimes numbers) in the headers. So I turn them into text with an apostrophe.

And another thing I do is save a temp version of the file somewhere, and then query that temp version rather than querying the open workbook. Even though the Memory Leak issue that this avoids has been largely fixed in New Excel, I still found that querying the open book was causing grief occasionally.

Anyway, here’s the result. I’ve turned it into a function, and you can pre-specify inputs if you like. Otherwise you’ll be prompted for the following:

20131119_UnPivot_Select Entire Crosstab

20131119_UnPivot_Select Left Column Headers

20131119_UnPivot_Select Crosstab Column Headers

20131119_UnPivot_FieldName

…and then you’ll get a Pivot:

20131119_UnPivot_Output

Take it for a spin, let me know of any issues in the comments. Note that I’ve tried to code it to handle Excel 2003 and earlier, but I don’t have old Excel anymore so couldn’t test it. In fact, that’s why the TabularLayout sub is separate – I had to put it in a subroutine because if someone has ‘old’ Excel then the main function wouldn’t compile.

—Edit 11 March 2014—
I’ve updated the below code to incorporate snb’s approach using array manipulation from Unpivot Shootout where possible.

Cheers

Jeff

25 thoughts on “UnPivot via SQL

  1. That’s a lot of code Jeff.

    Assume a table of 11 rows and 10 columns.
    Remove the fieldnames of the columns that have to be repeated for every record.
    The remaining columns have to be read into separate records.
    The last column will contais valuen (as in your example)
    The one to last column of each record will contain the columnlabel (fieldname) of the values (as in your example)

    Sub M_snb()
    sn = Cells(1).CurrentRegion
    x = Cells(1).CurrentRegion.Rows(1).SpecialCells(2).Count
    y = UBound(sn, 2) - x

    sp = Application.Index(sn, Evaluate("index(mod(row(1:" & x * (UBound(sn) - 1) & ")-1," & UBound(sn, 2) + y & ")+" & y & ",)"), Evaluate("if(column(A1:" & Chr(66 + y) & x * (UBound(sn) - 1) & ")=" & y + 2 & ",int((row(1:" & x * (UBound(sn) - 1) & ")-1)/" & UBound(sn, 2) + y & ")+" & y + 1 & ",column(A1:" & Chr(66 + y) & x * (UBound(sn) - 1) & "))"))
    For j = 1 To UBound(sp) - 1
    sp(j, UBound(sp, 2) - 1) = sn(1, (j - 1) \ (UBound(sn) - 1) + y + 1)
    Next

    Cells(20, 1).Resize(UBound(sp), UBound(sp, 2)) = sp
    End Sub

  2. Note, this is provided without warranty….

    Mine isn’t quite as elegant as snb’s but its generalized…. I don’t pull the data into an array so it does have a slight performance hit.

  3. Hi snb. That looks elegant. I haven’t had a chance to step through it yet, but looking forward to it.

    Yeah, mine is pretty complex compared to yours. That said, I built it because I needed to handle the case where unwinding a big crosstab would result in a flat file that would exceed the row limit in Excel. I originally wrote this when my old workplace had Excel 2003, and so was coming up on that limit all the time. And more recently I had a private client that had a crosstab filled with economic data so big that it would have exceeded the row limit in new Excel if unwound the traditional way. So this code was perfect.

    (Of course, they should never have got into the situation in the first place of having such a big crosstab. But they simply didn’t know any better until it got so bad that they were forced to pay me to fix it. I love organizations like that who misuse Excel, because they pay me to fix what shouldn’t have been broken in the first place).

    Your code works fine for me with 11 rows times 8 columns, but if I have 10 columns then I get some REF errors. Haven’t stepped through it yet to find out why.

    I’d be interested in how you would amend it so it would handle any size crosstab (within the row limit of Excel, of course). And I’d also be interested at comparing speed on very large crosstabs…particularly because I’m building a commercial add-in and want to have routines that are as fast as possible across a range of different datasets.

  4. This is all very useful, Jeff! I’ve used John W’s one-column flattening in the past, and also Bacon’s concatenation trick. But this seems much more powerful.

    Juanito

  5. snb: your code isn’t working so well for me.
    This is the table I’m using it on:

    1990 1991 1992 1993 1994 1995 1996 1997
    Australia Energy 289014 290872 296887 300178 301736 313486 320795 331023
    Australia Transport 62121 61503 62695 63987 65614 68357 70751 72152
    Australia Other Sectors 14744 14884 15300 15762 15669 16318 16680 16930
    Australia Industrial Processes 24627 23911 24552 24350 24551 24334 24170 24334
    Australia Agriculture 86812 86957 85253 84735 85256 86190 86487 87695
    Australia Waste 18016 17909 17688 17570 16983 16949 15622 15439
    Belarus Energy 102242 95782 88907 76539 64290 57259 58225 59522
    Belarus Transport 13074 12818 10583 8547 5368 4840 4815 4386
    Belarus Other Sectors 14792 14792 14771 14362 13428 11141 10547 10806
    Belarus Industrial Processes 3614 3614 3504 3292 2639 2004 2035 2136

    …and this is what it is returning:

    Australia Energy 1990 289014
    Australia Transport 1990 62121
    Australia Other Sectors 1990 14744
    Australia Industrial Processes 1990 24627
    Australia Agriculture 1990 86812
    Australia Waste 1990 18016
    Belarus Energy 1990 102242
    Belarus Transport 1990 13074
    Belarus Other Sectors 1990 14792
    Belarus Industrial Processes 1990 3614
    ‘#REF! #REF! 1991 #REF!
    #REF! #REF! 1991 #REF!
    Australia Energy 1991 290872
    Australia Transport 1991 61503
    Australia Other Sectors 1991 14884
    Australia Industrial Processes 1991 23911
    Australia Agriculture 1991 86957
    Australia Waste 1991 17909
    Belarus Energy 1991 95782
    Belarus Transport 1991 12818
    Belarus Other Sectors 1992 14792
    Belarus Industrial Processes 1992 3614
    #REF! #REF! 1992 #REF!
    #REF! #REF! 1992 #REF!
    Australia Energy 1992 296887
    Australia Transport 1992 62695
    Australia Other Sectors 1992 15300
    Australia Industrial Processes 1992 24552
    Australia Agriculture 1992 85253
    Australia Waste 1992 17688
    Belarus Energy 1993 88907
    Belarus Transport 1993 10583
    Belarus Other Sectors 1993 14771
    Belarus Industrial Processes 1993 3504
    #REF! #REF! 1993 #REF!
    #REF! #REF! 1993 #REF!
    Australia Energy 1993 300178
    Australia Transport 1993 63987
    Australia Other Sectors 1993 15762
    Australia Industrial Processes 1993 24350
    Australia Agriculture 1994 84735
    Australia Waste 1994 17570
    Belarus Energy 1994 76539
    Belarus Transport 1994 8547
    Belarus Other Sectors 1994 14362
    Belarus Industrial Processes 1994 3292
    #REF! #REF! 1994 #REF!
    #REF! #REF! 1994 #REF!
    Australia Energy 1994 301736
    Australia Transport 1994 65614
    Australia Other Sectors 1995 15669
    Australia Industrial Processes 1995 24551
    Australia Agriculture 1995 85256
    Australia Waste 1995 16983
    Belarus Energy 1995 64290
    Belarus Transport 1995 5368
    Belarus Other Sectors 1995 13428
    Belarus Industrial Processes 1995 2639
    #REF! #REF! 1995 #REF!
    #REF! #REF! 1995 #REF!
    Australia Energy 1996 313486
    Australia Transport 1996 68357
    Australia Other Sectors 1996 16318
    Australia Industrial Processes 1996 24334
    Australia Agriculture 1996 86190
    Australia Waste 1996 16949
    Belarus Energy 1996 57259
    Belarus Transport 1996 4840
    Belarus Other Sectors 1996 11141
    Belarus Industrial Processes 1996 2004
    #REF! #REF! 1997 #REF!
    #REF! #REF! 1997 #REF!
    Australia Energy 1997 320795
    Australia Transport 1997 70751
    Australia Other Sectors 1997 16680
    Australia Industrial Processes 1997 24170
    Australia Agriculture 1997 86487
    Australia Waste 1997 15622
    Belarus Energy 1997 58225
    Belarus Transport 13074 4815
  6. Basically this is the code you need:

    Sub M_snb()
    sn = Cells(1).CurrentRegion
    x = Cells(1).CurrentRegion.Rows(1).SpecialCells(2).Count
    y = UBound(sn, 2) - x

    ReDim sp(1 To x * (UBound(sn) - 1), 1 To 4)

    For j = 1 To UBound(sp)
    m = (j - 1) Mod (UBound(sn) - 1) + 2
    n = (j - 1) \ (UBound(sn) - 1) + y + 1
    sp(j, 1) = sn(m, 1)
    sp(j, 2) = sn(m, 2)
    sp(j, 3) = sn(1, n)
    sp(j, 4) = sn(m, n)
    Next

    Cells(20, 1).Resize(UBound(sp), UBound(sp, 2)) = sp
    End Sub

  7. The original suggestion should be rephrased into

    Sub M_snb()
    sn = Cells(1).CurrentRegion
    x = Cells(1).CurrentRegion.Rows(1).SpecialCells(2).Count
    y = UBound(sn, 2) - x

    sp = Application.Index(sn, Evaluate("index(mod(row(1:" & x * (UBound(sn) - 1) & ")-1," & UBound(sn) - 1 & ")+2,)"), Evaluate("if(column(A1:" & Chr(66 + y) & x * (UBound(sn) - 1) & ")=" & y + 2 & ",int((row(1:" & x * (UBound(sn) - 1) & ")-1)/(" & UBound(sn) - 1 & "))+" & y + 1 & ",column(A1:" & Chr(66 + y) & x * (UBound(sn) - 1) & "))"))

    For j = 1 To UBound(sp)
    sp(j, UBound(sp, 2) - 1) = sn(1, (j - 1) \ (UBound(sn) - 1) + y + 1)
    Next

    Cells(20, 1).Resize(UBound(sp), UBound(sp, 2)) = sp
    End Sub

  8. Call UnpivotRange(oWksSource.Range(“Source”), oWksTarget.Range(“A1”), True, 1, 2, 3, 4)

    This would call if if you want columns 1,2,3,4 to be repeatable

  9. The limit of 255 Columns is not there with Power Queries Unpivot – I just tried it on a data set with 305 Columns – Happened in a flash !

  10. snb: that’s fast. Originally I was doing it via manipulation of ranges, but of course that was too slow on large crosstabs. And at the time I didn’t have the programming chops to do the array version you posted above, so went with the multiple consolidation route instead, and then the SQL route to handle super big crosstabs for a client. But 99.99999% of the time, your code will suffice. Thanks for posting it.

  11. @Jeff

    The 65536 boundary can be tackled rather simply using:

    Sub M_snb()
    sn = Cells(1).CurrentRegion
    x = Cells(1).CurrentRegion.Rows(1).SpecialCells(2).Count
    y = UBound(sn, 2) - x

    For j = 1 To UBound(sn) \ 65000 + 1
    ReDim sq(1 To 65000, 1 To 4)

    For jj = 1 To 65000
    m = (65000 * (j - 1) + jj - 1) Mod (UBound(sn) - 1) + 2
    n = (65000 * (j - 1) + jj - 1) \ (UBound(sn) - 1) + y + 1
    sq(jj, 1) = sn(m, 1)
    sq(jj, 2) = sn(m, 2)
    sq(jj, 3) = sn(1, n)
    sq(jj, 4) = sn(m, n)
    Next
    Cells(20, 1).Offset(, j * 8).Resize(UBound(sq), UBound(sq, 2)) = sq
    Next
    End Sub

  12. Hi snb. I might not have made my intent clear. My unpivot code is usually just a vehicle to get a crosstab into a pivot so that I can do easier analysis. But that 65k row limit in old excel – or 1m row limit in new excel – means you can’t always get from a crosstab to a pivot via your fast method of creating a flat file in the worksheet, because the flat file might well be too long to fit on the worksheet. Which is why I wrote the SQL approach.

    So producing a flat file is a means to an ends, and not the end goal itself. Thinking about this some more, I think a much quicker way than my UNION ALL approach would be to populate a disconnected recordset, then using that to set up the pivotcache. I’ll give that a spin.

  13. I find a PivotTable is perfect in the case that users want to be able to filter, aggregate, view different dimensions etc on the fly. In short, pivots give the user a way to explore the data, and see what insights they discover.

    At the same time, I find Pivots perfect for fixed reports too in my reporting apps. If a user comes up and says “Hey this output is great, but it would be good if I also had another table that broke this down by Capex/Opex and by Cost Type” then I simply make a copy of the pivot, add some filters, and then say “There you go, all done”. Or I put in a ‘Custom View’ sheet where users can roll their own reports.

    Your code is perfect for practically all crosstab datasets I come across, where the client wants to put it into a pivot. Occasionally I strike a client with a particulary large crosstab that requires another approach. Sometimes this is because they have bad data practices e.g. putting way too much data into Excel instead of learning how to use Access. Sometimes this is due to how 3rd party data arrives at their machines e.g. e.g. downloading some weather data over a long timespan for many sites, and the web interface spits it out as a crosstab rather than say a flat file csv. Either way, they need to get this into a pivot on their own, without anyone’s help. They have no VBA or SQL skills, and perhaps limited or slow IT support for data transformations. That’s the nut that my code tackles.

  14. This might be an even simpler method:

    Assuming a Table
    – of which the first 2 columns have to be repeated
    – the column label has to be adde in the 3 colum of the resulting list
    – the 4th column contains the data for each column in the original Table.

    Sub M_snb()
    With sheet1.Cells(1).CurrentRegion
    sn = .Resize(, .Columns.Count + 1)
    End With

    For j = 3 To UBound(sn, 2) - 1
    With Sheet2.Cells(2 + (UBound(sn) - 1) * (j - 3), 1)
    .Resize(UBound(sn) - 1, 4) = Application.Index(sn, Evaluate("row(2:" & UBound(sn) & ")"), Array(1, 2, UBound(sn, 2), j))
    .Resize(UBound(sn) - 1, 1).Offset(, 2) = sn(1, j)
    End With
    Next
    End Sub

Leave a Reply

Your email address will not be published. Required fields are marked *