Disconnected Recordsets in VBA Classes

I read VBA Has No Class over at Excel Do, Dynamic Does with interest. Bob says:

As you can see, the disconnected recordset can do everything a collection class can do, but without having to hand-roll any of the methods, recordset has them builtin.

I’m not ready to give up collection classes yet. But I did think about using a disconnected recordset as the storage mechanism rather than a collection. It doesn’t take long to see that it’s just not practical. Collections hold objects and using a recordset means building up and tearing down objects too often.

However, what if I used disconnected recordsets inside my collection class? The way I use collection classes, I end up doing a lot of looping for such operations as sorting and filtering. I could create a disconnected recordset to do those operations for me and fill a new class instance with the result.

An example, you say? OK.

I have some contact data in A1:I1001. I won’t go over filling the class and all the normal class operations. Let’s look at this example procedure in a standard module.

This fills the class with all the records and runs through some sorting and filtering. The results are output to the Immediate Window. Before we get to the underlying properties, here’s what the result looks like.

The first part filters on sales greater than $4,970,000. The second part sorts the results of the first part by Sales in descending order. The last part does both operations in one line, although with a different filter.

In the Filtered property, a WHERE clause is passed in as an argument. The recordset is filled with all the records, filtered, then a new class is created with the results. The Sorted property isn’t much more complicated.

Both properties contain the FillRs private function and the FillFromRs property.

This is lifted straight from Bob’s code and modified to fit my data. It creates a new disconnected recordset, adds fields, and creates new records for all the child instances of CContact.

This is the reverse of the previous function. It creates a new CContacts instance and fills it up with whatever is in the recordset.

The Sorted property needs to be rewritten to allow for sorting on more fields, but that shouldn’t be difficult. I also need to run some speed tests to see if this is actually faster than simply looping. Even if it’s the same speed, I already like it better. I get far more flexibility in my filtering. Now I create a separate property for every field on which I want to filter. I’d have properties like FilterByState, FilterBySalesRegion, and FilterBySales. With this method I could have one Filtered property and pass in the WHERE clause.

You can download DisconnectRs.zip

28 thoughts on “Disconnected Recordsets in VBA Classes

  1. This might be good as a learning exercise — and it is definitely on the right track — but why use a disconnected data set? You have a *real* data set. Why not connect to it using ADO?

    The discussions that preceded this post came across as people having decided up front how they would solve a problem. Now, all they had to do was force the problem to look like something that would fit the selected solution.

    I would approach this somewhat differently…as I actually did for a project I recently did.

    There are three types of data structures that I was dealing with (and that you are dealing with).

    1) The data residing in some database (in your case a workbook; in my case first a workbook that evolved into an Access database; in other instances it might be a flat file or even an hierarchical system like a XML file).

    2) “Internal” data structures that are most appropriate for massaging and analyzing the data.

    3) Data structures to interact with the client (userforms and worksheets).

    Most of us have used a data abstraction layer. In this case I had two. The first, call it DAL1, connected the stored data to the internal data structures. The second, DAL2, connected the internal data structures to the UI.

    [I also had what one might call another layer, a computational layer that worked strictly with the internal data structures.]

    Now, the UI code did not know (nor did it care) how the data were obtained. As far as it was concerned it interacted with DAL2 and got what it wanted. It was, by design, oblivious to how DAL2 did what it did.

    DAL2 in some cases processed the request by itself and provided the result to the UI code. In other cases, it might request some assistance from DAL1 and in yet other cases it simply passed on the (reformatted) request to DAL1 which did the “heavy lifting.” An example of the last category of user requests was sorting or filtering the data. A SQL sort was far easier to write than reinventing the wheel for my internal structures. And, of course, the computational layer was available as needed.

    The benefit of this approach was that I could redirect DAL1 to point to an Excel workbook, an Access table, some corporate database, or any other database without impacting DAL2 or the UI code. As long as the DAL1 procedures processed the request correctly, I did not even have to test the rest of the code! In one case, the customer wanted to be able to change the data store (it might be on the local drive or the network drive or…). I modified the UI, added a procedure in DAL2 that passed on the request to DAL1. I then changed DAL1 to open a different data source. And, that was the extent of my testing.

    Similarly, a UI change (such as a userform redesign) had only limited impact. As long as the code interacting with DAL2 did not change the format or content of the arguments to/from DAL2, I did not have to test any code other than the UI code. And, if the changes did require modifications to DAL2, again, my testing did not have to extend to DAL1.

    Well, what if we have the data stored in a form that is not amenable to ADO/ODBC? Then, I’d have to do some amount of work in DAL1 but could still leave DAL2 and the UI and computation layers alone.

    I’ve been referring to layers without even defining their implementation. That too is intentional. {grin} They could be functions or subroutines in a standard module, they could be methods and properties in class modules or they could be some combination of the two. Again, the “how” is less important than the “what.”

    And, as far as what seems to fascinate most developers goes, I worry about performance when — if — it becomes an issue. Why do people worry about whether a piece of code needs 10 milliseconds or 5 milliseconds or 0.05 milliseconds? I would much rather be able to tell my client “OK, the UI redesign you want will be done by tomorrow.” {grin}

  2. Until now, I knew very little about recordsets and how they may be used in Excel. This opens a whole new world for me. May be I’m skimming the chapters, but I don’t recall very much discussion on recordsets in my core Excel VBA books.

    This was very enlightening indeed. Thank you!

  3. I just use the advanced filter function. Although it doesn’t work externally from excel it seems to work really well. I made a whole class that works with it and makes it pretty simple to work with. It would be interesting to see which is faster your method or advanced filter. I use it quite extensively, probably more than I should, it’s just so simple to work with it’s hard not to use it. I would consider switching over to disconnected recordsets if I knew it worked as fast or faster than advanced filter. It’s nice to have something that is so much more versatile than what I’m using too. Thanks for the post!

  4. I go with Jon that i too have less knowledge in recordset in excel, But after reading your post, i gain confidence that i will definitely become an expert in all topics of latest version of excel. Thanks Dick, for this great service.

  5. I did a simple filter test on my data and found the recordset was able to filter 1000 times in 0.9375 while the advanced filter took ~2.265625.

    Now when I had the program take the result and paste them to excel the built in AdvancedFilter method was much better (about the same time 2.26) than the ranges copyfromrecordset method (>17).

    I then populated an array and used that to paste to the range and it took even longer than the copyfromrecordset method (+~2 sec longer).

    So I guess it depends on the application for what you are doing to determine which method is better. I suppose if you aren’t working directly on the workbook and need to do a bunch of data crunching before adding your data directly to the worksheet it would be better with a recordset but if you are just copying and pasting data it’s not worth it, built in functions are better.

  6. k,

    I just tested the memory leak problem. I’m using Excel 2003 on an XP. Every time I opened and closed a connection it started increasing. And just like Microsoft said, the memory usage didn’t go down until I closed the excel application (I had two instances of the application open and it only affected the one application I was testing). That’s a bit of a bummer. It really limits the use of recordsets. I suppose if you are working only with newer excel (like k) then it might be worth it, assuming the memory leak is only with 2003 and older. Of course, if you work with an actual database and excel it’s probably pretty nice.

  7. I’ve seen people say the memory leak occurs in 2007, but I didnt see it. For my purposes if I can do a couple hundred queries with no increase I’m more than covered. (Actually, if I could do 20 I’d probably be fine.)

    My problem with the recordsets is that it took > 2 seconds to load and then “movenext” through the 33000 records. I need to store all possible searches so I have to go through each record. Using an array I can read 70000 records, loop through, and store a part of each record in a dictionary in about the same time it takes for the recordset to do 33000.

    I will definitely keep the recordset data structure in my toolbox, I’m just not sure I can use it right now. I like the idea of using SQL on excel workbooks… maybe I’ll use it for that.

  8. I’ve heard about that… never tried it. Unfortunately I have 2007 at work and I’m not sure they’d be real keen on letting me install it (IT being what it is). I’ll definitely have to try it out at home though, thanks for the tip!

  9. I left this comment on the other website and then realized I would never know if someone replies! So here it is again.

    So which is faster dictionary or recordset? (Explanation below if you want to read that much). While I have your attention is there a way to do webqueries with ADO? I haven’t been able to find it online, of course, I picked up “Professional ADO 2.5? and that’s been a huge help, I now know what the posters are talking about when they say shaped recordsets!

    I’m grabbing data from a open workbook and putting it in a class using dictionaries, so I wouldn’t be able to grab the data directly from the workbook (memory leak). I have to manipulate the data since some of it comes in the form of comma delimiters (from a webquery). After I split up the data I then reference the data with my class. It takes about a 1/2 second to get the data then to reference the data it takes up to a 1/2 second.

    I would be curious to know if recordsets would work faster than the dictionaries. Not sure exactly how I would do it since some of the entries are in an array and the number of entries vary depending on which one I’m referencing.

  10. OK, I did a test with shaped fabricated recordsets (sfr) vs dictionaries (dic) (in a class) for a project I’m working on.

    To load the data: sfr = 6.14, dic = 0.59
    To print out the data to worksheet (dic) or xml document (sfr): sfr=10.53, dic = 1.125
    To perform original purposes of data (two separate functions):
    sfr=1.58, dic = 0.109
    sfr=34.125, dic = 8.48

    Needless to say I was very disappointed. I thought recordsets were supposed to be very fast compared to the alternatives. Although I liked how simple and concise the shaped recordset “class” was (especially the built in calculations that shaped recordsets can perform) the time it takes to load and work was incredibly long. I’m sure there are times when recordsets would be better, but in my application it wasn’t.

    Would a recordset that’s not shaped work even faster than a shaped recordset? It would be interesting to see.

  11. OK, I did a normal fabricated recordset (nfr). It was also pretty slow (slower than the shaped fabricated recorset). I guess dictionaries is the way to go, unless I’m doing something wrong. I think part of the reason the nfr went so slow is that I did the calculations on the fly whereas the sfr I could make built in calculations. I suppose recordsets are good if you have a small data set but while using Excel there’s not much reason to go beyond the advanced filter or dictionaries unless you get the data from an actual database.

    Disappointing is what I think. sfr’s would have been a pretty sweet way to make a nice, concise class, with built in calculations.

    Here’s the times for the nfr.

    load: 4.1875 –> faster than sfr
    print: 4.8125 –> faster than sfr
    function 1: 4.625
    function 2: 909.6

  12. So did you use an array with a dictionary index? The biggest reason I would want to use recordsets is to be able to learn SQL. Guess speed will be a factor though…

  13. I used the dictionary that comes in the Microsoft Scripting Runtime. You have to reference it in your VBE. Supposedly it supposed to be faster than a collection object and it has more built in features (like the exists test). I use it to build classes (I don’t use collections very often anymore).

    This is a pretty good synopsis of the dictionary compared to the collection:
    http://visualbasic.ittoolbox.com/documents/dictionary-object-vs-collection-object-12196

    Yeah, I’m learning ADO so I can work with databases and excel together. It would have been nice if shaped recordsets worked faster. I’m sure there are ways to speed it up, I just don’t know how. shaped recordsets can be a whole class of itself with very few lined of code, unfortunately it’s just way too slow, especially if you have a large set of data (at least from my testing it appears that way).

  14. @k: “it took > 2 seconds to load and then ‘movenext’ through the 33000 records. I need to store all possible searches so I have to go through each record” — The trade term for this is RBAR = ‘row by agonizing row’ (http://www.simple-talk.com/sql/t-sql-programming/rbar–row-by-agonizing-row/). Consider that for your scenario it is extremely likely you would be be able to achieve the same result using a single SQL query (think “set-based solution”). If you want to post details, I for one always appreciate a SQL challenge ;)

  15. @Jamie Collins,

    I used it on the shape data set and it didn’t seem to improve the performance. Maybe since I have to search for each item one at a time (but I don’t know the order in advance) it won’t help that much. Thanks for the suggestion though.

    I did think of the perfect use of a fabricated shaped recordset this morning. Error handling reporting. I can use it for small data sets that are organized using XML and that makes it easy to gather, store and send the errors.

  16. Sorry Jon, I guess I didnt explain my question. I often will use an array to store records, but put the unique field from each record into a dictionary with the record’s row. That way I dont have to search the array. I was just wondering how you were storing records in a dictionary.

    @Jamie Collins: Sorry, cant provide the details… it’s work-related! I’m sure I could probably do it with a bunch of SQL queries… but even so, the speed of the recordset just ruins all appreciation for it.

  17. @k,

    Now my testing I used a bunch of finds for the sfr and filters for the nfr. So if you’re just looping through it I don’t know if that changes things and mine were fabricated so I don’t know if accessing it from an actual database changes things.

    I’m not sure I’m getting what you mean by records. Do you mean recordsets? I’m reading the an ADO 2.5 book right now and it refers to records as being related to internet stuff.

    This is what I’m doing:
    Class1 with a dictionary1 in it holding the name.
    dictionary1 –> class2
    class2 –> dictionary2
    etc.
    The classes hold all sorts of data, I have dictionaries, arrays, strings, integers, etc.

    That’s probably not what you are looking for. But if a dictionary can’t store a recordset then the class can and the dictionary can store the class.

  18. I mostly use two dimensional arrays. So my rows are records and my columns are fields, just like in databases. Instead of looping through the array to find things, I store the unique field in a dictionary key and the array row as it’s item.

  19. Interesting way of doing things. I guess you would need unique information for each row then? So you know which row to grab? It would be interesting to see which is faster. I would think that dictionaries within class would be easier to work with though. If you ever test the time difference it would be interesting to find out which is faster. I would think indexing an array would be pretty fast as long as when you create it you are not constantly redimming it.

  20. It’s much handier if you have unique information. I have played around with storing multiple row numbers in the dictionary item, but that requires parsing, etc.

    Is your method similar to a tree structure?

  21. Yes, I use a tree structure. It seems to be pretty easy to use it that way. Of course, now that I’m going to being getting data from a database I’ll be doing both tree and not, depending on the data.

  22. In reply to Jon…

    I’ve tested this, for a disconnected recordset of 60,000 rows: one set of tests got 28 seconds, another got 0.25 seconds… Trouble is, the two test runs were identical sets of simple For i= 1 to 1000 loops, except for Excel being restarted.

    What I *can* tell you is that in the slow runs, finding three keys in Scripting.Dictionaries was always faster than I could count with a VBA.Timer

    That is: the scripting dictionary isn’t just faster than the disconnected recordset, it’s faster than I can measure.

    I ran these on identical recordsets, except for appending WHERE RowNum < n for values of n at 101, 1001, 10000 and 60000:

    Set rst.ActiveConnection = Nothing

    rst.MoveLast
    strTest3 = rst!ISIN

    rst.MoveFirst
    strTest1 = rst!ISIN

    rst.Move CLng(rst.RecordCount / 2)
    strTest2 = rst!ISIN

    dblTimer = VBA.Timer
    Debug.Print “Start Timing Test 4: 3 ‘find’ operations in ” & rst.RecordCount & ” rows, 1000 times ” & VBA.Timer
    For i = 1 To 1000

    rst.Find “ISIN='” & strTest2 & “‘”
    rst.Find “ISIN='” & strTest3 & “‘”
    rst.Find “ISIN='” & strTest1 & “‘”

    Next i
    Debug.Print ” END Timing Test 4. ELAPSED: ” & VBA.Timer – dblTimer

  23. Huh? I didn’t click the ‘Submit’ button!

    Apologies for the partial post above (which will, I hope, be removed my the moderator).

    OK, let’s try again, and hope that I get to post the whole thing…

    In reply to Jon:

    I can vouch that Recordset ‘find’ operations are slow… For values of ‘slow’ that mean 0.001 seconds, per ‘find’ operation, in a recordset of 60,000 rows. That’s half a minute if your code’s so inefficient that it loops 3000 times through the entire recordset.

    Also: you’ve got to take a LOT of care in setting up consistent tests to prove it.

    One argument in this discussion is settled: finding keys in Scripting.Dictionaries was always faster than I could count with a VBA.Timer. That is to say: the scripting dictionary isn’t just faster than the disconnected recordset, it’s faster than I can measure. I could’ve tried to get a number at 100,000 iterations, but that would be meaningless: I’d be measuring the efficiency of my string-handling, as much as I’d measure anything.

    I can vouch that your indexing code works: rst.Fields(“ISIN”).Properties(“Optimize”) = True ‘Create an index and that it reduced the runtime to a consistent 0.03 seconds for all recordset sizes exceeding 1000 rows. The overhead for indexing was 0.2 seconds for the largest recordset (60k rows) and 0.02 secs for 1000 rows.

    So indexing is worth doing, but it’s never going to be as fast as the Scripting.Dictionary object.

    On the other hand, I didn’t test this with a million rows: but I don’t work with that kind of data in VBA.

    On the gripping hand, taking out the rst.MoveFirst statements in my loops made everything run in 0.015 seconds. Big recordsets, small recordsets, indexed or not. The moral is: test EVERYTHING, without assumptions, and never miss an opportunity to optimise. In this specific case, avoid MoveFirst, MoveLast, and anything that runs the record pointer off to EOF; and the killer optimisation is to minimise any failed search terms that will return rst.EOF – you’d probably gain something by using a sorted recordset and sequential lookups if you did have lots of failed ‘find’ operations; but much less than you would gain by cleaning your data.

    Finally, here’s the test code in full:

    Start Timing Test 1: 3 ‘find’ operations in 100 rows, 1000 times
    END Timing Test 1. ELAPSED: 0.0625

    Start Timing Test 2: 3 ‘find’ operations in 1000 rows, 1000 times
    END Timing Test 2. ELAPSED: 0.46875

    Start Timing Test 3: 3 ‘find’ operations in 10000 rows, 1000 times
    END Timing Test 3. ELAPSED: 4.359375

    Start Timing Test 4: 3 ‘find’ operations in 60000 rows, 1000 times
    END Timing Test 4. ELAPSED: 28.078125

    Creating index…
    END Creating index. ELAPSED: 0.1875

    Start Timing Test 5: 3 ‘find’ operations in 60000 rows, 1000 times, with an index
    END Timing Test 5. ELAPSED: 0.03125

    Start Timing Test 6: dictionary lookups 60000 items, 1000 times
    END Timing Test 6. ELAPSED: 0

    I ran these tests on identical recordsets, except for varying the sizes using “WHERE RowNum < n" for values of n at 101, 1001, 10000 and 60000:

    rst.CursorLocation = adUseClient

    rst.Open “SELECT * FROM myTable WHERE RowNum < 10000", objConnection, adOpenStatic, adLockReadOnly, adCmdText

    Set rst.ActiveConnection = Nothing

    rst.MoveLast
    strTest3 = rst!ISIN

    rst.MoveFirst
    strTest1 = rst!ISIN

    rst.Move CLng(rst.RecordCount / 2)
    strTest2 = rst!ISIN

    dblTimer = VBA.Timer
    Debug.Print "Start Timing Test 1: 3 'find' operations in " & rst.RecordCount & " rows, 1000 times "
    For i = 1 To 1000

    rst.MoveFirst
    rst.Find "ISIN='" & strTest2 & "'"
    rst.Find "ISIN='" & strTest3 & "'"
    rst.Find "ISIN='" & strTest1 & "'"

    Next i
    Debug.Print " END Timing Test 1. ELAPSED: " & VBA.Timer – dblTimer

    strData = rst.GetString
    Set rst = Nothing

    arrData = Join2d(strData, , , 1)

    strData = vbNullString

    Set m_oDict= New Scripting.Dictionary
    oDict.CompareMode = BinaryCompare

    For i = LBound(arrData, 1) To UBound(arrData, 1)

    If Not oDict.Exists(arrFile(i, 2)) Then
    oDict.Add arrData(i, 2), arrData(i, 1)
    End If

    Next i

    Erase arrData

    dblTimer = VBA.Timer
    Debug.Print "Start Timing Test 2: dictionary lookups " & oDict.Count & " items, 1000 times "
    For i = 1 To 1000

    If oDict.Exists(strTest2) Then strTest = oDict(strTest2)
    If oDict.Exists(strTest3) Then strTest = oDict(strTest3)
    If oDict.Exists(strTest1) Then strTest = oDict(strTest1)

    Next i
    Debug.Print " END Timing Test 2. ELAPSED: " & VBA.Timer – dblTimer

    For completeness, the Join2 Function is included below: it’s as good as you’ll get for VBA string-handling without delving into the native string functionality of Windows:

    Public Function Join2d(ByRef strInput As String, _
    Optional RowDelimiter As String = vbCr, _
    Optional FieldDelimiter = vbTab, _
    Optional CoerceLowerBound As Long = 0) As Variant

    ‘ Works like VBA.Strings.join, for a 2-dimensional array.
    ‘ Check your lower bounds on return: never assume that any array in VBA is zero-based, even if you’ve set Option Base 0
    ‘ If in doubt, cource the lower bounds to 0 or 1 by setting CoerceLowerBound
    ‘ Note that the default delimiters are those inserted into the string returned by ADODB.Recordset.GetString

    ‘ Coding note: we’re not doing any string-handling in VBA.Strings – allocating, deallocating and (especially!) concatenating strings is SLOW.
    ‘ However, the VBA Join, split, replace functions are linked directly to fast (by VBA standards) functions in the native Windows code

    Dim i As Long
    Dim j As Long

    Dim i_n As Long
    Dim j_n As Long

    Dim i_lBound As Long
    Dim i_uBound As Long
    Dim j_lBound As Long
    Dim j_uBound As Long

    Dim arrTemp1 As Variant
    Dim arrTemp2 As Variant

    arrTemp1 = Split(strInput, RowDelimiter)

    i_lBound = LBound(arrTemp1)
    i_uBound = UBound(arrTemp1)

    If VBA.LenB(arrTemp1(i_uBound)) <= 0 Then ' clip out empty last row: common artifact data loaded from files with a terminating row delimiter
    i_uBound = i_uBound – 1
    End If

    i = i_lBound
    arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

    j_lBound = LBound(arrTemp2)
    j_uBound = UBound(arrTemp2)

    If VBA.LenB(arrTemp2(j_uBound)) <= 0 Then ' ! potential error: first row with an empty last field…
    j_uBound = j_uBound – 1
    End If

    i_n = CoerceLowerBound – i_lBound
    j_n = CoerceLowerBound – j_lBound

    ReDim arrData(i_lBound + i_n To i_uBound + i_n, j_lBound + j_n To j_uBound + j_n)

    ' As we've got the first row already… populate it here, and start the main loop from lbound+1

    For j = j_lBound To j_uBound
    arrData(i_lBound + i_n, j + j_n) = arrTemp2(j)
    Next j

    For i = i_lBound + 1 To i_uBound Step 1

    arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

    For j = j_lBound To j_uBound Step 1

    arrData(i + i_n, j + j_n) = arrTemp2(j)

    Next j

    Erase arrTemp2

    Next i

    Erase arrTemp1

    Application.StatusBar = False

    Join2d = arrData

    End Function

    Yes, I’m late to the game with this comment: but this is the top result for ‘Disconnected Recordset’ and ‘Index’ on Google, so it’s worth recording anyway.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.