Disconnected Recordsets in VBA Classes

Posted on December 16, 2010March 11, 2018 by Dick Kusleika

I read VBA Has No Class over at Excel Do, Dynamic Does with interest. Bob says:

As you can see, the disconnected recordset can do everything a collection class can do, but without having to hand-roll any of the methods, recordset has them builtin.

I’m not ready to give up collection classes yet. But I did think about using a disconnected recordset as the storage mechanism rather than a collection. It doesn’t take long to see that it’s just not practical. Collections hold objects and using a recordset means building up and tearing down objects too often.

However, what if I used disconnected recordsets inside my collection class? The way I use collection classes, I end up doing a lot of looping for such operations as sorting and filtering. I could create a disconnected recordset to do those operations for me and fill a new class instance with the result.

An example, you say? OK.

I have some contact data in A1:I1001. I won’t go over filling the class and all the normal class operations. Let’s look at this example procedure in a standard module.

Sub TestFilterAndSort()
    
    Dim clsContact As CContact
    Dim clsContacts As CContacts
    Dim clsFiltered As CContacts
    Dim clsSorted As CContacts
    Dim vArray As Variant
    
    vArray = Sheet1.Range("A2:I1001").Value
    Set clsContacts = New CContacts
    clsContacts.FillFromArray vArray
    
    Set clsFiltered = clsContacts.Filtered("Sales > 4970000")
    
    For Each clsContact In clsFiltered
        Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales
    Next clsContact
    Debug.Print
    
    Set clsSorted = clsFiltered.Sorted("Sales", False)
    
    For Each clsContact In clsSorted
        Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales
    Next clsContact
    Debug.Print
    
    Set clsSorted = clsContacts.Filtered("Sales < 550000").Sorted("Sales")
    
    For Each clsContact In clsSorted
        Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales
    Next clsContact
    
End Sub

Sub TestFilterAndSort()

Dim clsContact As CContact

Dim clsContacts As CContacts

Dim clsFiltered As CContacts

Dim clsSorted As CContacts

Dim vArray As Variant

vArray = Sheet1.Range("A2:I1001").Value

Set clsContacts = New CContacts

clsContacts.FillFromArray vArray

Set clsFiltered = clsContacts.Filtered("Sales > 4970000")

For Each clsContact In clsFiltered

Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales

Next clsContact

Debug.Print

Set clsSorted = clsFiltered.Sorted("Sales", False)

For Each clsContact In clsSorted

Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales

Next clsContact

Debug.Print

Set clsSorted = clsContacts.Filtered("Sales < 550000").Sorted("Sales")

For Each clsContact In clsSorted

Debug.Print clsContact.ContactID, clsContact.PadFullName(30), clsContact.FormattedSales

Next clsContact

End Sub

This fills the class with all the records and runs through some sorting and filtering. The results are output to the Immediate Window. Before we get to the underlying properties, here’s what the result looks like.

The first part filters on sales greater than $4,970,000. The second part sorts the results of the first part by Sales in descending order. The last part does both operations in one line, although with a different filter.

Public Property Get Filtered(sWhere As String) As CContacts
    
    Dim clsContact As CContact
    Dim clsReturn As CContacts
    Dim rsContacts As ADODB.Recordset
    
    Set rsContacts = FillRs
    
    rsContacts.Filter = sWhere
        
    Set clsReturn = Me.FillFromRs(rsContacts)
    
    Set Filtered = clsReturn
    
End Property

Public Property Get Filtered(sWhere As String) As CContacts

Dim clsContact As CContact

Dim clsReturn As CContacts

Dim rsContacts As ADODB.Recordset

Set rsContacts = FillRs

rsContacts.Filter = sWhere

Set clsReturn = Me.FillFromRs(rsContacts)

Set Filtered = clsReturn

End Property

In the Filtered property, a WHERE clause is passed in as an argument. The recordset is filled with all the records, filtered, then a new class is created with the results. The Sorted property isn’t much more complicated.

Public Property Get Sorted(sFieldName As String, Optional bAsc As Boolean = True) As CContacts
    
    Dim clsContact As CContact
    Dim clsReturn As CContacts
    Dim rsContacts As ADODB.Recordset
    
    Set rsContacts = FillRs
    
    If bAsc Then
        rsContacts.Sort = sFieldName & " ASC"
    Else
        rsContacts.Sort = sFieldName & " DESC"
    End If
        
    Set clsReturn = Me.FillFromRs(rsContacts)
    
    Set Sorted = clsReturn
    
End Property

Public Property Get Sorted(sFieldName As String, Optional bAsc As Boolean = True) As CContacts

Dim clsContact As CContact

Dim clsReturn As CContacts

Dim rsContacts As ADODB.Recordset

Set rsContacts = FillRs

If bAsc Then

rsContacts.Sort = sFieldName & " ASC"

Else

rsContacts.Sort = sFieldName & " DESC"

End If

Set clsReturn = Me.FillFromRs(rsContacts)

Set Sorted = clsReturn

End Property

Both properties contain the FillRs private function and the FillFromRs property.

Private Function FillRs() As ADODB.Recordset
    
    Dim clsContact As CContact
    Dim rsContacts As ADODB.Recordset
    
    Set rsContacts = New ADODB.Recordset
    
    With rsContacts
        .Fields.Append “ContactID”, adInteger
        .Fields.Append “FirstName”, adVarChar, 50
        .Fields.Append “LastName”, adVarChar, 50
        .Fields.Append “Gender”, adVarChar, 1
        .Fields.Append “Company”, adVarChar, 100
        .Fields.Append “City”, adVarChar, 30
        .Fields.Append “State”, adVarChar, 30
        .Fields.Append “SalesRegion”, adVarChar, 6
        .Fields.Append “Sales”, adDouble
        .Fields.Append “Birthday”, adDate
        
        .CursorLocation = adUseClient
        .CursorType = adOpenStatic
        
        .Open
        
        For Each clsContact In Me
            .AddNew
            
            .Fields(“ContactID”).Value = clsContact.ContactID
            .Fields(“FirstName”).Value = clsContact.FirstName
            .Fields(“LastName”).Value = clsContact.LastName
            .Fields(“Gender”).Value = clsContact.Gender
            .Fields(“Company”).Value = clsContact.Company
            .Fields(“City”).Value = clsContact.City
            .Fields(“State”).Value = clsContact.State
            .Fields(“SalesRegion”).Value = clsContact.SalesRegion
            .Fields(“Sales”).Value = clsContact.Sales
            .Fields(“Birthday”).Value = clsContact.Birthday
        
            .Update
        Next clsContact
    End With
    
    Set FillRs = rsContacts
    
End Function

Private Function FillRs() As ADODB.Recordset

Dim clsContact As CContact

Dim rsContacts As ADODB.Recordset

Set rsContacts = New ADODB.Recordset

With rsContacts

.Fields.Append “ContactID”, adInteger

.Fields.Append “FirstName”, adVarChar, 50

.Fields.Append “LastName”, adVarChar, 50

.Fields.Append “Gender”, adVarChar, 1

.Fields.Append “Company”, adVarChar, 100

.Fields.Append “City”, adVarChar, 30

.Fields.Append “State”, adVarChar, 30

.Fields.Append “SalesRegion”, adVarChar, 6

.Fields.Append “Sales”, adDouble

.Fields.Append “Birthday”, adDate

.CursorLocation = adUseClient

.CursorType = adOpenStatic

.Open

For Each clsContact In Me

.AddNew

.Fields(“ContactID”).Value = clsContact.ContactID

.Fields(“FirstName”).Value = clsContact.FirstName

.Fields(“LastName”).Value = clsContact.LastName

.Fields(“Gender”).Value = clsContact.Gender

.Fields(“Company”).Value = clsContact.Company

.Fields(“City”).Value = clsContact.City

.Fields(“State”).Value = clsContact.State

.Fields(“SalesRegion”).Value = clsContact.SalesRegion

.Fields(“Sales”).Value = clsContact.Sales

.Fields(“Birthday”).Value = clsContact.Birthday

.Update

Next clsContact

End With

Set FillRs = rsContacts

End Function

This is lifted straight from Bob’s code and modified to fit my data. It creates a new disconnected recordset, adds fields, and creates new records for all the child instances of CContact.

Public Property Get FillFromRs(rsContacts As ADODB.Recordset) As CContacts
    
    Dim clsReturn As CContacts
    Dim clsContact As CContact
    
    With rsContacts
        Set clsReturn = New CContacts
        
        Do While Not .EOF
            Set clsContact = New CContact
            
            clsContact.ContactID = .Fields(“ContactID”).Value
            clsContact.FirstName = .Fields(“FirstName”).Value
            clsContact.LastName = .Fields(“LastName”).Value
            clsContact.Gender = .Fields(“Gender”).Value
            clsContact.Company = .Fields(“Company”).Value
            clsContact.City = .Fields(“City”).Value
            clsContact.State = .Fields(“State”).Value
            clsContact.SalesRegion = .Fields(“SalesRegion”).Value
            clsContact.Sales = .Fields(“Sales”).Value
            clsContact.Birthday = .Fields(“Birthday”).Value
            
            clsReturn.Add clsContact
            
            .MoveNext
        Loop
        
        .Close
        
    End With
    
    Set FillFromRs = clsReturn
    
End Property

Public Property Get FillFromRs(rsContacts As ADODB.Recordset) As CContacts

Dim clsReturn As CContacts

Dim clsContact As CContact

With rsContacts

Set clsReturn = New CContacts

Do While Not .EOF

Set clsContact = New CContact

clsContact.ContactID = .Fields(“ContactID”).Value

clsContact.FirstName = .Fields(“FirstName”).Value

clsContact.LastName = .Fields(“LastName”).Value

clsContact.Gender = .Fields(“Gender”).Value

clsContact.Company = .Fields(“Company”).Value

clsContact.City = .Fields(“City”).Value

clsContact.State = .Fields(“State”).Value

clsContact.SalesRegion = .Fields(“SalesRegion”).Value

clsContact.Sales = .Fields(“Sales”).Value

clsContact.Birthday = .Fields(“Birthday”).Value

clsReturn.Add clsContact

.MoveNext

Loop

.Close

End With

Set FillFromRs = clsReturn

End Property

This is the reverse of the previous function. It creates a new CContacts instance and fills it up with whatever is in the recordset.

The Sorted property needs to be rewritten to allow for sorting on more fields, but that shouldn’t be difficult. I also need to run some speed tests to see if this is actually faster than simply looping. Even if it’s the same speed, I already like it better. I get far more flexibility in my filtering. Now I create a separate property for every field on which I want to filter. I’d have properties like FilterByState, FilterBySalesRegion, and FilterBySales. With this method I could have one Filtered property and pass in the WHERE clause.

You can download DisconnectRs.zip

28 thoughts on “Disconnected Recordsets in VBA Classes”

Tushar Mehta says:

December 16, 2010 at 5:29 pm

This might be good as a learning exercise — and it is definitely on the right track — but why use a disconnected data set? You have a *real* data set. Why not connect to it using ADO?

The discussions that preceded this post came across as people having decided up front how they would solve a problem. Now, all they had to do was force the problem to look like something that would fit the selected solution.

I would approach this somewhat differently…as I actually did for a project I recently did.

There are three types of data structures that I was dealing with (and that you are dealing with).

1) The data residing in some database (in your case a workbook; in my case first a workbook that evolved into an Access database; in other instances it might be a flat file or even an hierarchical system like a XML file).

2) “Internal” data structures that are most appropriate for massaging and analyzing the data.

3) Data structures to interact with the client (userforms and worksheets).

Most of us have used a data abstraction layer. In this case I had two. The first, call it DAL1, connected the stored data to the internal data structures. The second, DAL2, connected the internal data structures to the UI.

[I also had what one might call another layer, a computational layer that worked strictly with the internal data structures.]

Now, the UI code did not know (nor did it care) how the data were obtained. As far as it was concerned it interacted with DAL2 and got what it wanted. It was, by design, oblivious to how DAL2 did what it did.

DAL2 in some cases processed the request by itself and provided the result to the UI code. In other cases, it might request some assistance from DAL1 and in yet other cases it simply passed on the (reformatted) request to DAL1 which did the “heavy lifting.” An example of the last category of user requests was sorting or filtering the data. A SQL sort was far easier to write than reinventing the wheel for my internal structures. And, of course, the computational layer was available as needed.

The benefit of this approach was that I could redirect DAL1 to point to an Excel workbook, an Access table, some corporate database, or any other database without impacting DAL2 or the UI code. As long as the DAL1 procedures processed the request correctly, I did not even have to test the rest of the code! In one case, the customer wanted to be able to change the data store (it might be on the local drive or the network drive or…). I modified the UI, added a procedure in DAL2 that passed on the request to DAL1. I then changed DAL1 to open a different data source. And, that was the extent of my testing.

Similarly, a UI change (such as a userform redesign) had only limited impact. As long as the code interacting with DAL2 did not change the format or content of the arguments to/from DAL2, I did not have to test any code other than the UI code. And, if the changes did require modifications to DAL2, again, my testing did not have to extend to DAL1.

Well, what if we have the data stored in a form that is not amenable to ADO/ODBC? Then, I’d have to do some amount of work in DAL1 but could still leave DAL2 and the UI and computation layers alone.

I’ve been referring to layers without even defining their implementation. That too is intentional. {grin} They could be functions or subroutines in a standard module, they could be methods and properties in class modules or they could be some combination of the two. Again, the “how” is less important than the “what.”

And, as far as what seems to fascinate most developers goes, I worry about performance when — if — it becomes an issue. Why do people worry about whether a piece of code needs 10 milliseconds or 5 milliseconds or 0.05 milliseconds? I would much rather be able to tell my client “OK, the UI redesign you want will be done by tomorrow.” {grin}
Redge Semplonius says:

December 16, 2010 at 7:20 pm

Until now, I knew very little about recordsets and how they may be used in Excel. This opens a whole new world for me. May be I’m skimming the chapters, but I don’t recall very much discussion on recordsets in my core Excel VBA books.

This was very enlightening indeed. Thank you!
Jon says:

December 17, 2010 at 12:13 am

I just use the advanced filter function. Although it doesn’t work externally from excel it seems to work really well. I made a whole class that works with it and makes it pretty simple to work with. It would be interesting to see which is faster your method or advanced filter. I use it quite extensively, probably more than I should, it’s just so simple to work with it’s hard not to use it. I would consider switching over to disconnected recordsets if I knew it worked as fast or faster than advanced filter. It’s nice to have something that is so much more versatile than what I’m using too. Thanks for the post!
k says:

December 17, 2010 at 11:43 am

See my comment about using an ADO connection to populate an array… it’s MUCH faster than using an array. I see no memory leak from an open workbook. Any thoughts? http://msmvps.com/blogs/xldynamic/archive/2010/10/10/vba-has-no-class.aspx#1784546

k
k says:

December 17, 2010 at 11:44 am

Sorry, that should be “using an ADO connection to populate a RECORDSET.”
David says:

December 18, 2010 at 5:24 am

I go with Jon that i too have less knowledge in recordset in excel, But after reading your post, i gain confidence that i will definitely become an expert in all topics of latest version of excel. Thanks Dick, for this great service.
Disconnected Recordsets in VBA Classes says:

December 18, 2010 at 10:30 am

[…] collection class can do, but without having to hand-roll any of the methods, recordset has them… [full post] Dick Kusleika Daily Dose of Excel classescollections 0 0 0 […]
Jon says:

December 21, 2010 at 5:08 pm

I did a simple filter test on my data and found the recordset was able to filter 1000 times in 0.9375 while the advanced filter took ~2.265625.

Now when I had the program take the result and paste them to excel the built in AdvancedFilter method was much better (about the same time 2.26) than the ranges copyfromrecordset method (>17).

I then populated an array and used that to paste to the range and it took even longer than the copyfromrecordset method (+~2 sec longer).

So I guess it depends on the application for what you are doing to determine which method is better. I suppose if you aren’t working directly on the workbook and need to do a bunch of data crunching before adding your data directly to the worksheet it would be better with a recordset but if you are just copying and pasting data it’s not worth it, built in functions are better.
Jon says:

December 21, 2010 at 6:25 pm

k,

I just tested the memory leak problem. I’m using Excel 2003 on an XP. Every time I opened and closed a connection it started increasing. And just like Microsoft said, the memory usage didn’t go down until I closed the excel application (I had two instances of the application open and it only affected the one application I was testing). That’s a bit of a bummer. It really limits the use of recordsets. I suppose if you are working only with newer excel (like k) then it might be worth it, assuming the memory leak is only with 2003 and older. Of course, if you work with an actual database and excel it’s probably pretty nice.
k says:

December 21, 2010 at 8:49 pm

I’ve seen people say the memory leak occurs in 2007, but I didnt see it. For my purposes if I can do a couple hundred queries with no increase I’m more than covered. (Actually, if I could do 20 I’d probably be fine.)

My problem with the recordsets is that it took > 2 seconds to load and then “movenext” through the 33000 records. I need to store all possible searches so I have to go through each record. Using an array I can read 70000 records, loop through, and store a part of each record in a dictionary in about the same time it takes for the recordset to do 33000.

I will definitely keep the recordset data structure in my toolbox, I’m just not sure I can use it right now. I like the idea of using SQL on excel workbooks… maybe I’ll use it for that.
Bob Phillips says:

December 22, 2010 at 4:06 am

k, Use PowerPivot.
k says:

December 22, 2010 at 9:08 pm

I’ve heard about that… never tried it. Unfortunately I have 2007 at work and I’m not sure they’d be real keen on letting me install it (IT being what it is). I’ll definitely have to try it out at home though, thanks for the tip!
Jon says:

February 4, 2011 at 4:47 pm

I left this comment on the other website and then realized I would never know if someone replies! So here it is again.

So which is faster dictionary or recordset? (Explanation below if you want to read that much). While I have your attention is there a way to do webqueries with ADO? I haven’t been able to find it online, of course, I picked up “Professional ADO 2.5? and that’s been a huge help, I now know what the posters are talking about when they say shaped recordsets!

I’m grabbing data from a open workbook and putting it in a class using dictionaries, so I wouldn’t be able to grab the data directly from the workbook (memory leak). I have to manipulate the data since some of it comes in the form of comma delimiters (from a webquery). After I split up the data I then reference the data with my class. It takes about a 1/2 second to get the data then to reference the data it takes up to a 1/2 second.

I would be curious to know if recordsets would work faster than the dictionaries. Not sure exactly how I would do it since some of the entries are in an array and the number of entries vary depending on which one I’m referencing.
Jon says:

February 10, 2011 at 3:54 pm

OK, I did a test with shaped fabricated recordsets (sfr) vs dictionaries (dic) (in a class) for a project I’m working on.

To load the data: sfr = 6.14, dic = 0.59
To print out the data to worksheet (dic) or xml document (sfr): sfr=10.53, dic = 1.125
To perform original purposes of data (two separate functions):
sfr=1.58, dic = 0.109
sfr=34.125, dic = 8.48

Needless to say I was very disappointed. I thought recordsets were supposed to be very fast compared to the alternatives. Although I liked how simple and concise the shaped recordset “class” was (especially the built in calculations that shaped recordsets can perform) the time it takes to load and work was incredibly long. I’m sure there are times when recordsets would be better, but in my application it wasn’t.

Would a recordset that’s not shaped work even faster than a shaped recordset? It would be interesting to see.
Jon says:

February 10, 2011 at 6:59 pm

OK, I did a normal fabricated recordset (nfr). It was also pretty slow (slower than the shaped fabricated recorset). I guess dictionaries is the way to go, unless I’m doing something wrong. I think part of the reason the nfr went so slow is that I did the calculations on the fly whereas the sfr I could make built in calculations. I suppose recordsets are good if you have a small data set but while using Excel there’s not much reason to go beyond the advanced filter or dictionaries unless you get the data from an actual database.

Disappointing is what I think. sfr’s would have been a pretty sweet way to make a nice, concise class, with built in calculations.

Here’s the times for the nfr.

load: 4.1875 > faster than sfr
print: 4.8125 > faster than sfr
function 1: 4.625
function 2: 909.6
k says:

February 10, 2011 at 8:50 pm

So did you use an array with a dictionary index? The biggest reason I would want to use recordsets is to be able to learn SQL. Guess speed will be a factor though…
Jon says:

February 10, 2011 at 10:49 pm

I used the dictionary that comes in the Microsoft Scripting Runtime. You have to reference it in your VBE. Supposedly it supposed to be faster than a collection object and it has more built in features (like the exists test). I use it to build classes (I don’t use collections very often anymore).

This is a pretty good synopsis of the dictionary compared to the collection:
http://visualbasic.ittoolbox.com/documents/dictionary-object-vs-collection-object-12196

Yeah, I’m learning ADO so I can work with databases and excel together. It would have been nice if shaped recordsets worked faster. I’m sure there are ways to speed it up, I just don’t know how. shaped recordsets can be a whole class of itself with very few lined of code, unfortunately it’s just way too slow, especially if you have a large set of data (at least from my testing it appears that way).
Jamie Collins says:

February 11, 2011 at 2:42 am

@Jon: one way could be to create an index on an ADO Recordset Field via its Optimize property (http://msdn.microsoft.com/en-us/library/ms677521(v=vs.85).aspx): “An index can improve the performance of operations that find or sort values in a Recordset. The index is internal to ADO”
Jamie Collins says:

February 11, 2011 at 2:49 am

@k: “it took > 2 seconds to load and then ‘movenext’ through the 33000 records. I need to store all possible searches so I have to go through each record” — The trade term for this is RBAR = ‘row by agonizing row’ (http://www.simple-talk.com/sql/t-sql-programming/rbarrow-by-agonizing-row/). Consider that for your scenario it is extremely likely you would be be able to achieve the same result using a single SQL query (think “set-based solution”). If you want to post details, I for one always appreciate a SQL challenge ;)
Jon says:

February 11, 2011 at 8:30 am

@Jamie Collins,

I used it on the shape data set and it didn’t seem to improve the performance. Maybe since I have to search for each item one at a time (but I don’t know the order in advance) it won’t help that much. Thanks for the suggestion though.

I did think of the perfect use of a fabricated shaped recordset this morning. Error handling reporting. I can use it for small data sets that are organized using XML and that makes it easy to gather, store and send the errors.
k says:

February 11, 2011 at 2:38 pm

Sorry Jon, I guess I didnt explain my question. I often will use an array to store records, but put the unique field from each record into a dictionary with the record’s row. That way I dont have to search the array. I was just wondering how you were storing records in a dictionary.

@Jamie Collins: Sorry, cant provide the details… it’s work-related! I’m sure I could probably do it with a bunch of SQL queries… but even so, the speed of the recordset just ruins all appreciation for it.
Jon says:

February 11, 2011 at 4:27 pm

@k,

Now my testing I used a bunch of finds for the sfr and filters for the nfr. So if you’re just looping through it I don’t know if that changes things and mine were fabricated so I don’t know if accessing it from an actual database changes things.

I’m not sure I’m getting what you mean by records. Do you mean recordsets? I’m reading the an ADO 2.5 book right now and it refers to records as being related to internet stuff.

This is what I’m doing:
Class1 with a dictionary1 in it holding the name.
dictionary1 > class2
class2 > dictionary2
etc.
The classes hold all sorts of data, I have dictionaries, arrays, strings, integers, etc.

That’s probably not what you are looking for. But if a dictionary can’t store a recordset then the class can and the dictionary can store the class.
k says:

February 12, 2011 at 6:55 am

I mostly use two dimensional arrays. So my rows are records and my columns are fields, just like in databases. Instead of looping through the array to find things, I store the unique field in a dictionary key and the array row as it’s item.
Jon says:

February 12, 2011 at 7:54 am

Interesting way of doing things. I guess you would need unique information for each row then? So you know which row to grab? It would be interesting to see which is faster. I would think that dictionaries within class would be easier to work with though. If you ever test the time difference it would be interesting to find out which is faster. I would think indexing an array would be pretty fast as long as when you create it you are not constantly redimming it.
k says:

February 12, 2011 at 12:40 pm

It’s much handier if you have unique information. I have played around with storing multiple row numbers in the dictionary item, but that requires parsing, etc.

Is your method similar to a tree structure?
Jon says:

February 18, 2011 at 1:44 pm

Yes, I use a tree structure. It seems to be pretty easy to use it that way. Of course, now that I’m going to being getting data from a database I’ll be doing both tree and not, depending on the data.
Nigel Heffernan says:

April 20, 2012 at 12:24 pm

In reply to Jon…

I’ve tested this, for a disconnected recordset of 60,000 rows: one set of tests got 28 seconds, another got 0.25 seconds… Trouble is, the two test runs were identical sets of simple For i= 1 to 1000 loops, except for Excel being restarted.

What I *can* tell you is that in the slow runs, finding three keys in Scripting.Dictionaries was always faster than I could count with a VBA.Timer

That is: the scripting dictionary isn’t just faster than the disconnected recordset, it’s faster than I can measure.

I ran these on identical recordsets, except for appending WHERE RowNum < n for values of n at 101, 1001, 10000 and 60000:

Set rst.ActiveConnection = Nothing rst.MoveLast strTest3 = rst!ISIN rst.MoveFirst strTest1 = rst!ISIN rst.Move CLng(rst.RecordCount / 2) strTest2 = rst!ISIN dblTimer = VBA.Timer Debug.Print “Start Timing Test 4: 3 ‘find’ operations in ” & rst.RecordCount & ” rows, 1000 times ” & VBA.Timer For i = 1 To 1000 rst.Find “ISIN='” & strTest2 & “‘” rst.Find “ISIN='” & strTest3 & “‘” rst.Find “ISIN='” & strTest1 & “‘” Next i Debug.Print ” END Timing Test 4. ELAPSED: ” & VBA.Timer – dblTimer
Nigel Heffernan says:

April 20, 2012 at 2:08 pm

Huh? I didn’t click the ‘Submit’ button!

Apologies for the partial post above (which will, I hope, be removed my the moderator).

OK, let’s try again, and hope that I get to post the whole thing…

In reply to Jon:

I can vouch that Recordset ‘find’ operations are slow… For values of ‘slow’ that mean 0.001 seconds, per ‘find’ operation, in a recordset of 60,000 rows. That’s half a minute if your code’s so inefficient that it loops 3000 times through the entire recordset.

Also: you’ve got to take a LOT of care in setting up consistent tests to prove it.

One argument in this discussion is settled: finding keys in Scripting.Dictionaries was always faster than I could count with a VBA.Timer. That is to say: the scripting dictionary isn’t just faster than the disconnected recordset, it’s faster than I can measure. I could’ve tried to get a number at 100,000 iterations, but that would be meaningless: I’d be measuring the efficiency of my string-handling, as much as I’d measure anything.

I can vouch that your indexing code works: rst.Fields(“ISIN”).Properties(“Optimize”) = True ‘Create an index and that it reduced the runtime to a consistent 0.03 seconds for all recordset sizes exceeding 1000 rows. The overhead for indexing was 0.2 seconds for the largest recordset (60k rows) and 0.02 secs for 1000 rows.

So indexing is worth doing, but it’s never going to be as fast as the Scripting.Dictionary object.

On the other hand, I didn’t test this with a million rows: but I don’t work with that kind of data in VBA.

On the gripping hand, taking out the rst.MoveFirst statements in my loops made everything run in 0.015 seconds. Big recordsets, small recordsets, indexed or not. The moral is: test EVERYTHING, without assumptions, and never miss an opportunity to optimise. In this specific case, avoid MoveFirst, MoveLast, and anything that runs the record pointer off to EOF; and the killer optimisation is to minimise any failed search terms that will return rst.EOF – you’d probably gain something by using a sorted recordset and sequential lookups if you did have lots of failed ‘find’ operations; but much less than you would gain by cleaning your data.

Finally, here’s the test code in full:

Start Timing Test 1: 3 ‘find’ operations in 100 rows, 1000 times
END Timing Test 1. ELAPSED: 0.0625

Start Timing Test 2: 3 ‘find’ operations in 1000 rows, 1000 times
END Timing Test 2. ELAPSED: 0.46875

Start Timing Test 3: 3 ‘find’ operations in 10000 rows, 1000 times
END Timing Test 3. ELAPSED: 4.359375

Start Timing Test 4: 3 ‘find’ operations in 60000 rows, 1000 times
END Timing Test 4. ELAPSED: 28.078125

Creating index…
END Creating index. ELAPSED: 0.1875

Start Timing Test 5: 3 ‘find’ operations in 60000 rows, 1000 times, with an index
END Timing Test 5. ELAPSED: 0.03125

Start Timing Test 6: dictionary lookups 60000 items, 1000 times
END Timing Test 6. ELAPSED: 0

I ran these tests on identical recordsets, except for varying the sizes using “WHERE RowNum < n" for values of n at 101, 1001, 10000 and 60000:

rst.CursorLocation = adUseClient

rst.Open “SELECT * FROM myTable WHERE RowNum < 10000", objConnection, adOpenStatic, adLockReadOnly, adCmdText

Set rst.ActiveConnection = Nothing

rst.MoveLast
strTest3 = rst!ISIN

rst.MoveFirst
strTest1 = rst!ISIN

rst.Move CLng(rst.RecordCount / 2)
strTest2 = rst!ISIN

dblTimer = VBA.Timer
Debug.Print "Start Timing Test 1: 3 'find' operations in " & rst.RecordCount & " rows, 1000 times "
For i = 1 To 1000

rst.MoveFirst
rst.Find "ISIN='" & strTest2 & "'"
rst.Find "ISIN='" & strTest3 & "'"
rst.Find "ISIN='" & strTest1 & "'"

Next i
Debug.Print " END Timing Test 1. ELAPSED: " & VBA.Timer – dblTimer

strData = rst.GetString
Set rst = Nothing

arrData = Join2d(strData, , , 1)

strData = vbNullString

Set m_oDict= New Scripting.Dictionary
oDict.CompareMode = BinaryCompare

For i = LBound(arrData, 1) To UBound(arrData, 1)

If Not oDict.Exists(arrFile(i, 2)) Then
oDict.Add arrData(i, 2), arrData(i, 1)
End If

Next i

Erase arrData

dblTimer = VBA.Timer
Debug.Print "Start Timing Test 2: dictionary lookups " & oDict.Count & " items, 1000 times "
For i = 1 To 1000

If oDict.Exists(strTest2) Then strTest = oDict(strTest2)
If oDict.Exists(strTest3) Then strTest = oDict(strTest3)
If oDict.Exists(strTest1) Then strTest = oDict(strTest1)

Next i
Debug.Print " END Timing Test 2. ELAPSED: " & VBA.Timer – dblTimer

For completeness, the Join2 Function is included below: it’s as good as you’ll get for VBA string-handling without delving into the native string functionality of Windows:

Public Function Join2d(ByRef strInput As String, _
Optional RowDelimiter As String = vbCr, _
Optional FieldDelimiter = vbTab, _
Optional CoerceLowerBound As Long = 0) As Variant

‘ Works like VBA.Strings.join, for a 2-dimensional array.
‘ Check your lower bounds on return: never assume that any array in VBA is zero-based, even if you’ve set Option Base 0
‘ If in doubt, cource the lower bounds to 0 or 1 by setting CoerceLowerBound
‘ Note that the default delimiters are those inserted into the string returned by ADODB.Recordset.GetString

‘ Coding note: we’re not doing any string-handling in VBA.Strings – allocating, deallocating and (especially!) concatenating strings is SLOW.
‘ However, the VBA Join, split, replace functions are linked directly to fast (by VBA standards) functions in the native Windows code

Dim i As Long
Dim j As Long

Dim i_n As Long
Dim j_n As Long

Dim i_lBound As Long
Dim i_uBound As Long
Dim j_lBound As Long
Dim j_uBound As Long

Dim arrTemp1 As Variant
Dim arrTemp2 As Variant

arrTemp1 = Split(strInput, RowDelimiter)

i_lBound = LBound(arrTemp1)
i_uBound = UBound(arrTemp1)

If VBA.LenB(arrTemp1(i_uBound)) <= 0 Then ' clip out empty last row: common artifact data loaded from files with a terminating row delimiter
i_uBound = i_uBound – 1
End If

i = i_lBound
arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

j_lBound = LBound(arrTemp2)
j_uBound = UBound(arrTemp2)

If VBA.LenB(arrTemp2(j_uBound)) <= 0 Then ' ! potential error: first row with an empty last field…
j_uBound = j_uBound – 1
End If

i_n = CoerceLowerBound – i_lBound
j_n = CoerceLowerBound – j_lBound

ReDim arrData(i_lBound + i_n To i_uBound + i_n, j_lBound + j_n To j_uBound + j_n)

' As we've got the first row already… populate it here, and start the main loop from lbound+1

For j = j_lBound To j_uBound
arrData(i_lBound + i_n, j + j_n) = arrTemp2(j)
Next j

For i = i_lBound + 1 To i_uBound Step 1

arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

For j = j_lBound To j_uBound Step 1

arrData(i + i_n, j + j_n) = arrTemp2(j)

Next j

Erase arrTemp2

Next i

Erase arrTemp1

Application.StatusBar = False

Join2d = arrData

End Function

Yes, I’m late to the game with this comment: but this is the top result for ‘Disconnected Recordset’ and ‘Index’ on Google, so it’s worth recording anyway.