A VBA performance class

Hi everyone!

If you write a lot of VBA code you probably sometimes run into performance issues. A customer rings you and complains that the Excel file you built for them does not perform well. I’ve written a small article on my site, inspired by a post here: Error Handling via an Error Class. I needed a quick way to log and report performance of subroutines and I also wanted to see how many times a routine was called.

See: A VBA performance class


Jan Karel Pieterse

Sending Images via WinSCP

Since my recent move to Digital Ocean for hosting, I’ve had to make a change to how I upload images for this blog. I used to create an FTP file and a batch file, but as far as I know that doesn’t support SFTP. I’m using WinSCP to transfer files manually and learned that it has a command line interface. I made a procedure called SendViaSCP to replace my SendViaFTP.

Public Sub SendViaSCP(vFname As Variant)

Dim aScript() As String
Dim i As Long

ReDim aScript(1 To 4 + UBound(vFname))

aScript(1) = "option batch abort"
aScript(2) = "option confirm off"
aScript(3) = "open sftp://username:password@"
aScript(UBound(aScript)) = "exit"

For i = LBound(vFname) To UBound(vFname)
aScript(3 + i) = "put " & Dir(vFname(i)) & " /home/wordpress/public_html/blogpix/"
Next i

Open "winscpup.txt" For Output As #1
Print #1, Join(aScript, vbNewLine)
Close #1

Shell "winscpup.bat"

End Sub

The vFname arguments is a variant array that holds all of the files I selected from Application.GetOpenFileName. The aScript array holds three lines of setup, a line for each file, and an exit line.

The commands are joined together and written to a batch file and the batch file is run. It doesn’t solve the problem that Billkamm and Haines solved of having your username and password in a batch file, but I can live with it.

You might be wondering why I don’t just use the file upload functions in WordPress. What fun would that be?

Subtracting Cells in the Status Bar

Sometimes I just want to quickly see the difference between two cells or groups of cells. Excel puts some great aggregates in the status bar.

and you can even customize them. Right click on the those aggregates.

But I wanted the difference. So I wrote some code to find it. I already had a class module with an Application object declared WithEvents, so I added this SheetSelectionChange event procedure.

Private Sub mxlApp_SheetSelectionChange(ByVal Sh As Object, ByVal Target As Range)

If TypeName(Selection) = "Range" Then
ShowDifferenceStatus Selection
End If

End Sub

That event procedure calls this procedure in a standard module.

Public Sub ShowDifferenceStatus(rSel As Range)

Dim wf As WorksheetFunction
Dim vStatus As Variant

On Error Resume Next

Set wf = Application.WorksheetFunction

If rSel.Areas.Count = 1 Then
If rSel.Columns.Count = 2 Then
vStatus = "Difference: " & Format(wf.Sum(rSel.Columns(1)) - wf.Sum(rSel.Columns(2)), "#,##0.00")
ElseIf rSel.Rows.Count = 2 Then
vStatus = "Difference: " & Format(wf.Sum(rSel.Rows(1)) - wf.Sum(rSel.Rows(2)), "#,##0.00")
vStatus = False
End If
ElseIf rSel.Areas.Count = 2 Then
If (rSel.Areas(1).Columns.Count = 1 And rSel.Areas(2).Columns.Count = 1) Or _
(rSel.Areas(1).Rows.Count = 1 And rSel.Areas(2).Rows.Count = 1) Then

vStatus = "Difference: " & Format(wf.Sum(rSel.Areas(1)) - wf.Sum(rSel.Areas(2)), "#,##0.00")
End If
vStatus = False
End If

Application.StatusBar = vStatus

End Sub

If the selection is contiguous (Areas.Count = 1), it determines if there are two columns or two rows. Then it uses the SUM worksheet function to sum up the first and subtract the sum of the second. Anything other that two columns tow rows resets the StatusBar by setting it to False. Subtracting one cell from the other is easy enough, but I wanted the ability to subtract one column from the other (or one row). Using SUM also avoids me having to check for text or other nonsense that SUM does automatically. Here’s one where I only have one Area selected and it contains two columns. It sums the numbers in column B and subtracts the sum of column C.

When the selection is not contiguous (Areas.Count = 2), then it determines if both areas have only one column or only one row. If either has more than one, it resets the status bar. But if they both have one (of either), it subtracts them. Here I’ve selected B2:B3, then held down the Control key while I selected C3:C4. That’s two areas, but each only has one column, so it assumes I want to subtract columns.

The next feature I want to add is to recognize filtered data. Often I’m working with a filtered Table and although two cells appear to be adjacent, selecting them without holding down Control really selects all those filtered cells in between. I guess I’ll need to loop through and determine what’s visible, build a range from only those cells, and sum that. For now, I’m just holding down control and using the mouse to select them. If you’re not familiar, the “mouse” is that blob of plastic several inches away from home row (aka the productivity killer). Excuse me while I get off my soap box and finish this post.

I tried to glean the NumberFormat of the cells selected and use that in the display. You can see from the code above that I punted and just used a comma and two decimals. But that stinks for really small numbers. Originally, I had something like

vStatus = "Difference: " & Format(wf.Sum(rSel.Columns(1)) - wf.Sum(rSel.Columns(2)), rSel.Cells(1).NumberFormat)

But look at the craziness when the cell as the Accounting format (_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(@_))

It works well for times though.

Apparently the syntax for cell formatting is slightly different than for the VBA.Format function. I haven’t worked out what the differences are, but maybe someday I will.

Identifying duplicates between multiple lists

Howdy folks. Jeff here, back from my summer holiday in the Coromandel Peninsula in the North Island of New Zealand, where I’ve been staring at this for the last 21 days:
DDOE_Identifying duplicates between lists_Opoutere

For the next 344 I’ll be staring at this:
DDOE_Identifying duplicates between lists_Excel
God, it’s good to be home.

A while back I answered this thread for someone wanting to identify any duplicate values found between 4 separate lists.

The way I understood the question, if something appears in each of the four lists, the Op wanted to know about it. If an item just appeared in 3 lists but not all 4, then they didn’t want it to be picked up. And the lists themselves might have duplicates within each list.

Say we’ve got these 4 lists:
DDOE_Identifying duplicates between lists_Names

We can’t simply use Conditional Formatting, because that will include duplicate names that don’t appear in each and every column, such as ‘Mike’:
DDOE_Identifying duplicates between lists_Wrong

Rather, we only want names that appear in every column:
DDOE_Identifying duplicates between lists_Right

I wrote a routine that handled any number of lists, using two dictionaries and a bit of shuffling between them. And the routine allows users to select either a contiguous range if their lists are all in one block, or multiple non-contiguous ranges if they aren’t.

  1. The user gets prompted for the range where they want the identified duplicates to appear:
    DDOE_Identifying duplicates between lists_Select Output Range

  3. Then they get prompted to select the first list. The items within that list get added to Dic_A. (If they select more than one columns, the following steps get executed automatically).
    DDOE_Identifying duplicates between lists_Select First Range

  5. Next they get prompted to select the 2nd list, at which point the code attempts to add each new item to Dic_A. If an item already exists in Dic_A then we know it’s a duplicate between lists, and so we add it to Dic_B. At the end of this, we clear Dic_A. Notice that any reference to selecting a contiguous range has been dropped from the InputBox:
    DDOE_Identifying duplicates between lists_Select 2nd range

  7. When they select the 3rd list, then it attempts to add each new item to Dic_B, and if an error occurs, then we know it’s a duplicate between lists, and so we add it to Dic_A. At the end of this, we clear Dic_B. We carry on in this manner until the user pushes Cancel (and notice now that the InputBox message tells them to push cancel when they’re done):
    DDOE_Identifying duplicates between lists_Select 3rd range

Pretty simple: just one input box, an intentional infinite loop, and two dictionaries that take turns holding the current list of dictionaries. Hours of fun.

Only problem is, I had forgotten to account for the fact that there might be duplicates within a list. The old code would have misinterpreted these duplicates as between-list duplicates, rather than within-list duplicates. The Op is probably completely unaware, and probably regularly bets the entire future of his country’s economy based on my bad code. Oops.

I’ve subsequently added another step where a 3rd dictionary is used to dedup the items in the list currently being processed. Here’s the revised code. My favorite line is the Do Until “Hell” = “Freezes Over” one.

Error Handling via an Error Class

A while ago I read an answer on stackoverflow about error handling. I can’t seem to find that question now, so you’ll have to take my word for it. The question was asking about error handling best practices in VBA and I found one of the answers interesting. The answerer said that you could use the Terminate event of a custom class module as your error logger. I had never thought of that.

I’ve been using the Professional Excel Development error handling method with great success for many years. This method controls the error as it moves back up the call stack, either by returning a Boolean to the calling procedure or by rethrowing the error. Without error handling, VBA sends the error back up the call stack automatically until it is handled. So, for instance, if you only handled errors in your entry point procedures, you would still have an error handler. You just wouldn’t have the information about the stack that would be critical to finding out where the error occurred.

The class method of error handling takes advantage of the built-in ability of VBA to pull an error back up the stack. It uses the fact that local variables go out of scope when the procedure is complete, such as when an unhandled error occurs and the cursor is sent to the calling procedure. If you had a local variable pointing to a class, that class’s Terminate event would fire when an unhandled error occurred.

Borrowing the setup from PED2, let’s see how this would work.

EntryPoint calls SubProc1. SubProc1 calls SubProc2. An error occurs in SubProc2. Only EntryPoint has error handling. It uses On Error Goto ErrorHandler to route program flow to the error handling section. Neither SubProc1 nor SubProc2 have any error handling. We’re going to let VBA kick the error back up to the call stack until it gets to EntryPoint.

Each procedure has a local variable that points to an instance of CError. CError is a custom class whose Terminate event I’ll be using to log the error as it moves back up the stack. When the error occurs in SubProc2, the clsError variable in SubProc2 goes out of scope and its Terminate event fires. The error is passed up to SubProc1 by VBA by design. Because there is no error handling in SubProc1, that error causes the instance of clsError in SubProc1 to go out of scope and its Terminate event fires.

Once again, VBA does it’s thing by passing control back up the stack, error in tow. EntryPoint does have error handling, so when program control reaches it, the ErrorHandler section goes into action. Assuming we’re not in debug mode, the first thing to do is terminate clsError by setting it to nothing. By the time we exit this procedure, the built-in Err object will have been reset and we won’t have anything to log. By setting clsError in EntryPoint to Nothing, we get the last entry in our log. After that, the error is displayed and program control is sent back to ErrorExit for any clean up (no clean up in this example, just the Exit Sub).

The log looks like this:

01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.SubProc2, Error 11: Division by zero
01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.SubProc1, Error 11: Division by zero
01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.EntryPoint, Error 11: Division by zero

Of course I made it virtually identical to PED’s log entry.

Instead of putting error handling in all of the downstream procedures, I just put a local variable that will terminate when an error occurs. The class looks like this:

I’ve kept the logging pretty simple for this example. In the Class_Terminate event, I first check to see if Err.Number is zero. This method relies on the fact that the Terminate event will fire when an error occurs. But in reality, the Terminate event will fire when the subprocedure finishes without error too. It fires whenever my local variable goes out of scope and that happens when there’s an error or when the subprocedure completes. We only want to log when an error occurs, so we have to check that.

The logging is a simple Debug.Print statement. To replicate the PED method, that would need to be expanded to write to a log file.

This is a very simple example that I put together to see how this might be setup. There might be some problems with this method that I haven’t encountered. I’m not advocating that you use this method, but I am intrigued by its simplicity. If you have any thoughts on this method of error handling or on error handling in general, leave a comment below.

You can download errorclass2.zip

UnPivot Shootout

Jeff here, again. PivotTables again. Sorry ’bout that.

snb posted a very concise bit of code to unwind crosstabs over at Unpivot by SQL and so I got to wondering how my much longer routine handled in comparison.

My approach used SQL and lots of Union All statements to do the trick. And lots and lots of code. Whereas snb uses arrays to unwind the crosstab, which is fine so long as you don’t run out of worksheet to post the resulting flat-file in. Which is going to be the case 99.999999% of the time. And frankly, crosstabs in the other 0.000001% of cases deserve to be stuck as crosstabs.

At the same time, I thought I’d also test a previous approach of mine that uses the Multiple Consolidation trick that Mike Alexander outlines at Transposing a Dataset with a PivotTable. This approach:

  1. copies the specific contiguous or non-contiguous columns of data that the user want to turn into a flat file to a new sheet.
  2. concatenates all the columns on the left into one column, while putting the pipe character ‘|’ between each field so that later we can split these apart into separate columns again.
  3. creates a pivot table out of this using Excel’s ‘Multiple Consolidation Ranges’ option. Normally this type of pivot table is used for combining data on different sheets, but it has the side benefit of taking horizontal data and providing a vertical extract once you double click on the Grand Total field. This is also known as a ‘Reverse Pivot’.
  4. splits our pipe-delimited column back into seperate columns, using Excel’s Text-to-Column funcionality.

snb’s approach

snbs’ code for a dataset with two non-pivot fields down the left looked like this:

…which I’m sure you’ll all agree falls somewhere on the spectrum between good looking and positivity anorexic. So I put a bit of meat on it’s bones so that it prompts you for ranges and handles any sized cross-tab:

Talk about yo-yo dieting!

Multiple Consolidation Trick approach

And here’s my code that uses the Multiple Consolidation trick:

The SQL appoach is the same as I published here.

And the winner is…

…snb. By a long shot. With the ever-so-slight caveat that you’re crosstabs are not so stupidly fat that the resulting flat file exceeds the number of rows in Excel.

Here’s how things stacked up on a 53 Column x 2146 Row crosstab, which gives a 117,738 row flat-file:

Approach Time (M:SS)
snb 0:01
UnPivotByConsolidation 0:04
UnPivotBySQL 0:14

And here’s how things stacked up on a 53 Columns x 19,780 Row crosstab, giving a 1,048,340 row flat-file (i.e. practically the biggest sized crosstab that you can unwind):

Approach Time (M:SS)
snb 0:19
UnPivotByConsolidation 0:42
UnPivotBySQL 2:17

So there you have it. Use snb’s code. Unless you have no choice but to use my longer, slower SQL approach.

Update 26 November 2013
It was remiss of me not to mention the Data Normalizer routine in Doug Glancy’s great yoursumbuddy blog, which is just about as fast as snb’s approach below. Go check it out, and subscribe to Doug’s blog while you’re there if you haven’t already.

If you don’t want the hassle of working out which to use, here’s a routine that uses snb’s if possible, and otherwise uses mine:

Repeating Values in Pivot Tables

Often I’m using a PivotTable to aggregate some data to use elsewhere. I’ll take a PivotTable that looks like this

and make it look like a grid so that I can copy and paste it somewhere else. To do that, I first go to the PivotTable Options – Display tab and change it to Classic PivotTable layout.

Then I’ll go to each PivotItem that’s a row and remove the subtotal

and check the Repeat item labels checkbox.

And I get a PivotTable that’s ready for copying and pasting.

After about 50 times of doing that, I got sick of it. Now I just run this code.