Much ado about ADO

Jeff here again. I had a crack at rewriting my Unpivot via SQL routine, to see if I could make it simpler and faster. The point of the routine is to let you turn a very large cross-tab directly into a PivotTable in the case that a flat file would be too long to fit in Excel’s grid. The original routine works by making a temp copy of the file (to avoid Memory Leak) and then doing lots and lots of UNION ALLs against that temp copy to unwind it one cross-tab column at a time. (The reason we need those UNION ALLs is that there is no UNPIVOT command in the pigeon English dialect of SQL that Excel and Access speak.) My routine then executes the SQL via ADO, and creates a Pivot directly out of the resulting RecordSet.

So if we had a data set that looked like this:

CrossTab

…then the resulting SQL looks something like this:

SELECT [Country], [Sector], [Base year (Convention)], [1990] AS Total, '1990' AS [Year] FROM [Data$A18:H28]
UNION ALL SELECT [Country], [Sector], [Base year (Convention)], [1991] AS Total, '1991' AS [Year] FROM [Data$A18:H28]
UNION ALL SELECT [Country], [Sector], [Base year (Convention)], [1992] AS Total, '1992' AS [Year] FROM [Data$A18:H28]
UNION ALL SELECT [Country], [Sector], [Base year (Convention)], [1993] AS Total, '1993' AS [Year] FROM [Data$A18:H28]
UNION ALL SELECT [Country], [Sector], [Base year (Convention)], [1994] AS Total, '1994' AS [Year] FROM [Data$A18:H28]

But as per my previous post, the code to accomplish this is pretty long. This is partly because the Microsoft JET/ACE Database engine has a hard limit of 50 ‘UNION ALL’ clauses, which you will soon exceed if you have a big cross-tab. I get around this limit by creating sub-blocks of SELECT/UNION ALL statements under this limit, and then stitching these sub-blocks with an outer UNION ALL ‘wrapper’. But that results in fairly complicated code and presumably quite a bit of work for the JET/ACE driver.

So I got to thinking that rather than using all those UNION ALLs to create the RecordSet with SQL, why not just populate a disconnected RecordSet directly from an amended version of snb’s elegant code, like so:

This routine worked fine on a small number of records. For instance, it unwound a cross-tab of 1000 rows x 100 cross-tab columns = 100,000 records in 8 seconds. Not exactly lightning, but it got there.

But it did not work fine on larger ones: at around 2500 rows x 100 cross-tab columns = 250,000 records it returned an Out of Memory error. So that rules the Disconnected RecordSet approach out for unwinding super-size cross-tabs. Unless you’re manipulating small data sets, steer clear of disconnected RecordSets.

Not to be deterred, I thought I’d try a different approach: I amended snb’s original approach so that it split a large flat file across multiple tabs in need, and then wrote a seperate routine to mash together the data in those tabs with SQL. This will result in far fewer UNION ALL’s (one per sheet) than my original code (one per column), and hopefully much faster performance.

Here’s how I revised SNB’s code:

That part works a treat. Takes around 38 seconds to take a 19780 Row x 100 Column crosstab = 1,977,900 records and spit it out as a flat file in two sheets.

And here’s the code that stiches those together into one PivotTable:

I tested this routine on some sheets with smaller datasets in them initially. Works just fine.

200,000 records in 2 sheets, no problem

But on bigger stuff, weirdsville:
For instance, here’s what I got when I tried to run it on 250,000 records split across two sheets:
DDOE_MuchAdoAboutADO_ExternalTableIsNotInTheExpectedFormat
What do you mean, “External table is not in the expected format“? It was just a minute ago!

Pushing debug shows that the oConn.Open sConnection line is highlighted. When I pushed F5 then the code ran without hitch, and produced the pivot as expected. So who knows what that was about.

But when I tried it on a larger dataset of 600,000 records in each sheet, I got an ‘Unexpected error from external database driver (1)’ message:

DDOE_MuchAdoAboutADO_UnexpectedErrorFromExternalDatabaseDriver

You betcha it’s unexpected! Googling didn’t turn up much, apart from some people having issues trying to get very big datasets from Excel into SQL Server. One person’s problem was solved by adding in imex=1 in the connection string, but it didn’t do anything for me.

I tried running the sub on several sheets with various amounts of records in each. About the maximum I could pull through was 3 sheets of 400,000 rows. But anything bigger, then I got that Unexpected Error From External Database Driver error again.

Next I tried running the code on just one sheet with various rows of data in it, to see what happened. After I push F5 to ignore the External Table Is Not In The Expected Format error, it did manage to produce a pivot in all cases, but the pivot may or may not contain all the data, depending on how many records were in the source sheet. For instance:

  • If there’s 500,000 records in the flat file I 500,000 records in the pivot.
  • If there’s 750,000 records in the flat file I only 613,262 records in the pivot. wtf?
  • If there’s 1,000,000 records in the flat file, I 613,262 records in the pivot again. wtfa?

Whereas my original UNPIVOT BY SQL routine could handle much larger datasets than either the disconnected RecordSet approach or the above one without complaining.

Well screw you, code…I’m giving you an error back:
DDOE_MuchAdoAboutADO_OutOfPatience

9 thoughts on “Much ado about ADO

  1. Did you consider ?

    – to unpivot the crosstab into a string
    – save that string to a csv-file
    – import that string into an access db
    – create a privottable in Excel, linked to (based on) the Access DB

  2. Yeah, I thought briefly about saving it to an .accdb file. But I really want something stand-alone that I could distribute easily, and thought that having to rely on having an .accdb file saved somewhere added too much complexity – particularly given that I want to build this into an add-in along with a whole bunch of other pivot-centric routines. That said, now you’ve got me wondering whether your approach might be faster, and whether I can bypass the .accdb file bit altogether and build a pivot directly based on a CSV.

    Any thoughts on why my 2nd approach above doesn’t work as expected, snb?

  3. Ah well. Guess I’ll just stick with my complicated UNION ALL method in the rare instances that the crosstab is too big for your much simpler/faster approach.

    I wonder why the ADO approach above isn’t working? Very strange that a SELECT ALL does not in fact select all.

  4. I have a similar problem. I have a dBase file with almost 2 million records. I need write this dBase file into txt file. I was just using vba excel for this. But When it reaches around 1,100,000 records, I receive a out of memory error. In the first line of the subroutine, after openning the file, the recordcount statement already gives a out of memory error. It looks like I can’t read more than the rows limit of a worksheet. But, again, I am not using a worksheet. Just using vba in excel to write a txt file.

  5. Interesting point about 50 UNION ALL clauses: I haven’t met that particular limit.

    I met a different limit: “Too many fields defined”

    The JET.OLEDB.4.0 drivers have a hard limit of 255 fields across all your UNION clauses: that is to say: you can have twenty-five union clauses returning ten fields each, but not twenty-six of them.

  6. Hi Nigel. These are edge cases. You obviously live on the edge too :-)

    Sam advises that fortunately PowerQuery doesn’t suffer from this hard limit in his comment at http://dailydoseofexcel.com/archives/2013/11/19/unpivot-via-sql/#comment-96724 so there’s our escape clause I guess.

    I timed my VBA/Array based unpivot from that link against PowerQuery on a dataset that (when unwound) had more than 1,000,000 rows some time back, and found that they both run in the exact same time. I keep meaning to write a post called “If you’re finding VBA slow, then you’re not finding the *right* VBA. “.

    Colin Banefield talks about this in his insightful comment at https://www.powerpivotpro.com/2015/04/secret-pot-roast-recipe-power-query-vs-vba-macros/#comment-124541


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.