Arrays Explained

Here’s a nice explanation of arrays

Arrays, What’s the Point

On the Stackoverlow Podcast, Joel and Jeff agree that arrays are obsolete. There are far better data structures, most of which are built on arrays.

I still use arrays quite a bit. But considering I’m using an obsolete language (VBA = VB6), I guess that’s not a surprise to anyone. There’s surely no better way to read and write to an Excel range that with an array.

But I use it for simple lists too.

Sub ArrayExample()
   
    Dim vaInclude As Variant ‘holds variant array
   Dim lMatch As Long
   
    vaInclude = Array(“chiefs”, “bengals”, “broncos”)
   
    On Error Resume Next
        lMatch = Application.WorksheetFunction.Match(“packers”, vaInclude, False)
    On Error GoTo 0
   
    Debug.Print “Is in list: “ & CBool(lMatch > 0)
   
End Sub

One thing that I plan to change about my coding habits is Redimming arrays. I’m the type of guy who only adds one memory location at a time to his arrays so that the array is only as big as I need it. I know that’s inefficient and I’m going to stop it. Joel recommended doubling the upper bound whenever you need to increase the allocation. I’d never heard that before. Is that what most people do? Or do they just add an arbitrarily large number? I need to come up with some best practices around Redimming, like how to test if an array is initialized, how much to add, how to only loop through what’s there, etc.

Posted in Uncategorized

19 thoughts on “Arrays Explained

  1. I tend to think that the Variant created by the Array() function is a somewhat different (and thoroughly useful) beast to the Dimmed array. I’m fairly sure it’s going to have a different internal representation, too.

    As regards the ReDimming thing, I think I’d be inclined to make myself a SmartArray class. Then I could have L/Ubound equivalents, implement whatever resizing strategy suits me best and what-have-you.

    I wonder how it might be different if a Dictionary was used instead? I’m sure it would be slower (although Dictionaries are pretty fast) but the difference might only be milliseconds, which might not matter.

  2. I too increase the Ubound by 1 when required, principally because I imagined that this would limit the amount of memory that needed to be reserved, albeit at the expense of extra Redim statements which I seem to recall are quite slow.

    Maybe in these days of 2+GB RAM I could afford to be a bit less stingy? I too am interested to hear what others consider best practice.

  3. I now tend to use the Scripting.Dictionary Object for most of my variable array requirements. It’s fast and has the benefit of being a “modern” hash table implementation.

    Tom

  4. If you need more room in an array, just arbitrarily doubling the size seems wasteful. I usually use arrays with no fixed size, same as you do in the code above. Once you determine the size you need, you can just use ReDim to resize it. For example:

    Sub Read_Into_Array()
    ‘ reads worksheet values into an array
    Dim arrData As Variant
    Dim ColACount As Long

    Dim i As Long

    ColACount = Range(Range(“A1”), Range(“A” & Rows.Count).End(xlUp)).Count

    ReDim arrData(1 To ColACount, 1 To 2)

      For i = 1 To ColACount
        arrData(i, 1) = Range(“A” & i).Value
        arrData(i, 2) = Range(“B” & i).Value
      Next i

    End Sub

  5. I think the “doubling” rule is a good “rule of thumb” to follow, though there may be exceptions.

    The point here is that if you may have to REDIM more than once, the overhead involved is making the array too big ONCE is much less than the overhead required to reconstruct the array (which is what is actually being done with a REDIM PRESERVE) multiple times.

  6. A SmartArray class is fine, but you can’t write it to a range or use it as an argument to a WorksheetFunction method.

    Does it matter to anyone that Dictionary objects require a reference and Collections are native? Any chance someone wouldn’t have the right dll on their machine?

  7. I use tons of arrays. They’re great for sharing data between Excel and VBA. They’re not as fast as using, say, disconnected recordsets, but that’s another whole level of complexity.

    I used to have a routine that incremented the UBound by 10 or 20 whenever necessary. Doubling makes more sense.

  8. The Collections in .NET use the strategy you describe; if you add items beyond the capacity, then the capacity gets doubled. I thought this was the case, but Google didn’t yield any proof so I wrote a quick C# console app to check that I did not dream that :)

    var list = new List();
    for (int index = 0; index

  9. Dick –

    If you built a SmartArray class, why wouldn’t you include methods to populate the array from a range, or dump the array’s contents into a range?

  10. Mathias: Escape any less-than signs in your code or it will get cut off.

    Jon: Because I’m just doing the array work inside the class, so why not just do it where I need it. OK, there are good reasons for having it in the class, but it only moves the implementation questions.

  11. When you read a range into a variant you don’t need to redim it anyway, and this is much the most efficient way of transferring data from Excel to VBA. For example –

    Dim vArr as variant
    vArr=Range(“A1”).resize(Range(“A65536”).end(xlup).row,2)

    Otherwise I either Dim the Array to the max possible size or double it.

    Usually its not worth bothering to do a final Redim Preserve to shrink the array unless you are assigning the array back to a Range.

  12. I too am a big fan of arrays, and I don’t care at all if they are “obsolete”. Since worksheet ranges are in effect 2D arrays it makes sense to me to use them in VBA. The ease with which data is passed from a worksheet range to a VBA variant is an added bonus, and the fact that much engineering analysis involves working with matrices (which is just another name for arrays) makes them the obvious choice.

    In nearly all cases I will know (or be able to calculate) the array size, based on the size of the range/array passed from the spreadsheet, so there will only be a need for at most one re-dim at the start. The only one of my applications I can think of immediately where this doesn’t apply are functions/subs for reading text files. In this case I step through the file first to count the number of rows, then redim an array, then read each line into the array. That sounds hopelessly inefficient, but in practice it adds hardly anything to the execution time. An alternative would be to just dimension the array to the maximum number of rows in a spreadsheet.

    I tend to use arrays with named ranges, then to write the entire contents of an array to a named range:

    With Range(“MyNamedRange”)
    .Resize(NumArrayRows, NumArrayColumns).Name = “MyNamedRange”
    .Value = MyArray
    End With

    This works virtually instantaneously, even with huge ranges. The improvement in performance compared with writing cell by cell is likely to be a factor of tens or hundreds (or compared with cell by cell writing with auto-recalc still enabled, thousands).

    More here: http://newtonexcelbach.wordpress.com/2008/03/05/ranges-and-arrays-2/

  13. Doug –

    For a text file that has rows separated by CR or LF, you could read the entire file into a string, then use Split to construct your array.

    Depending on the application, I have sometimes first converted CSV files into an array of rows by splitting at the CR/LF, then converted each row into an array by splitting at the tab character. You reference the elements using Array(i)(j) instead of Array(i,j), but I guess that’s about the only difference.

  14. First text in which I read about doubling array bound was Kernighan & Pike’s ‘The Practice of Programming’. I figure everyone else in Bell Labs was using this approach for years before it found its way into this book. And they probably took this from someone else back in the 1980s. Regardless, it ain’t new.

    1D arrays are very useful. 2D arrays a little less so, thogh very handy for interfacing with ranges. Higher dimension arrays are usually much less useful compared to other structures in my experience.

  15. Doubling sounds good, but perhaps we should consider +10% increments like SQL Server defaults to on its table space allocation growth.

    I am a combination of lazy and clean, so I prefer to increment by 1. I think if you use Excel VBA, easy to read code is much better than fast but complex code.

    Also some functions require an Array of an exact size, for exampe the Sheets(Array(“Sheet1?,”Sheet2?)).select and if you had a huge array with empty gaps then the code would fail.

  16. On some apps, fast *is* preferable over easy to read. Charles Williams and I enhanced the performance of our Name Manager quite a bit for workbooks with many range names (over a 100) and it helped to make the app usable in such situations.

  17. I like having a definite upper bound, because often my arrays go back into the worksheet. Sometimes I’ll run through a loop once to count how many rows my array will need, create the array, then populate it. I don’t really know what the performance hit is, but usually in my apps, the time spent waiting for the user to scratch his head and click something is far far longer.

  18. I set my arrays at some pre-deterimed size, and if I fill it, I double it. At the end I ReDim to the required size.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.