Subtracting Cells in the Status Bar

Sometimes I just want to quickly see the difference between two cells or groups of cells. Excel puts some great aggregates in the status bar.

and you can even customize them. Right click on the those aggregates.

But I wanted the difference. So I wrote some code to find it. I already had a class module with an Application object declared WithEvents, so I added this SheetSelectionChange event procedure.

That event procedure calls this procedure in a standard module.

If the selection is contiguous (Areas.Count = 1), it determines if there are two columns or two rows. Then it uses the SUM worksheet function to sum up the first and subtract the sum of the second. Anything other that two columns tow rows resets the StatusBar by setting it to False. Subtracting one cell from the other is easy enough, but I wanted the ability to subtract one column from the other (or one row). Using SUM also avoids me having to check for text or other nonsense that SUM does automatically. Here’s one where I only have one Area selected and it contains two columns. It sums the numbers in column B and subtracts the sum of column C.

When the selection is not contiguous (Areas.Count = 2), then it determines if both areas have only one column or only one row. If either has more than one, it resets the status bar. But if they both have one (of either), it subtracts them. Here I’ve selected B2:B3, then held down the Control key while I selected C3:C4. That’s two areas, but each only has one column, so it assumes I want to subtract columns.

The next feature I want to add is to recognize filtered data. Often I’m working with a filtered Table and although two cells appear to be adjacent, selecting them without holding down Control really selects all those filtered cells in between. I guess I’ll need to loop through and determine what’s visible, build a range from only those cells, and sum that. For now, I’m just holding down control and using the mouse to select them. If you’re not familiar, the “mouse” is that blob of plastic several inches away from home row (aka the productivity killer). Excuse me while I get off my soap box and finish this post.

I tried to glean the NumberFormat of the cells selected and use that in the display. You can see from the code above that I punted and just used a comma and two decimals. But that stinks for really small numbers. Originally, I had something like

But look at the craziness when the cell as the Accounting format (_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(@_))

It works well for times though.

Apparently the syntax for cell formatting is slightly different than for the VBA.Format function. I haven’t worked out what the differences are, but maybe someday I will.

Identifying duplicates between multiple lists

Howdy folks. Jeff here, back from my summer holiday in the Coromandel Peninsula in the North Island of New Zealand, where I’ve been staring at this for the last 21 days:
DDOE_Identifying duplicates between lists_Opoutere

For the next 344 I’ll be staring at this:
DDOE_Identifying duplicates between lists_Excel
God, it’s good to be home.

A while back I answered this thread for someone wanting to identify any duplicate values found between 4 separate lists.

The way I understood the question, if something appears in each of the four lists, the Op wanted to know about it. If an item just appeared in 3 lists but not all 4, then they didn’t want it to be picked up. And the lists themselves might have duplicates within each list.

Say we’ve got these 4 lists:
DDOE_Identifying duplicates between lists_Names

We can’t simply use Conditional Formatting, because that will include duplicate names that don’t appear in each and every column, such as ‘Mike’:
DDOE_Identifying duplicates between lists_Wrong

Rather, we only want names that appear in every column:
DDOE_Identifying duplicates between lists_Right

I wrote a routine that handled any number of lists, using two dictionaries and a bit of shuffling between them. And the routine allows users to select either a contiguous range if their lists are all in one block, or multiple non-contiguous ranges if they aren’t.

  1. The user gets prompted for the range where they want the identified duplicates to appear:
    DDOE_Identifying duplicates between lists_Select Output Range
  2.  

  3. Then they get prompted to select the first list. The items within that list get added to Dic_A. (If they select more than one columns, the following steps get executed automatically).
    DDOE_Identifying duplicates between lists_Select First Range
  4.  

  5. Next they get prompted to select the 2nd list, at which point the code attempts to add each new item to Dic_A. If an item already exists in Dic_A then we know it’s a duplicate between lists, and so we add it to Dic_B. At the end of this, we clear Dic_A. Notice that any reference to selecting a contiguous range has been dropped from the InputBox:
    DDOE_Identifying duplicates between lists_Select 2nd range
  6.  

  7. When they select the 3rd list, then it attempts to add each new item to Dic_B, and if an error occurs, then we know it’s a duplicate between lists, and so we add it to Dic_A. At the end of this, we clear Dic_B. We carry on in this manner until the user pushes Cancel (and notice now that the InputBox message tells them to push cancel when they’re done):
    DDOE_Identifying duplicates between lists_Select 3rd range
  8.  

Pretty simple: just one input box, an intentional infinite loop, and two dictionaries that take turns holding the current list of dictionaries. Hours of fun.

Only problem is, I had forgotten to account for the fact that there might be duplicates within a list. The old code would have misinterpreted these duplicates as between-list duplicates, rather than within-list duplicates. The Op is probably completely unaware, and probably regularly bets the entire future of his country’s economy based on my bad code. Oops.

I’ve subsequently added another step where a 3rd dictionary is used to dedup the items in the list currently being processed. Here’s the revised code. My favorite line is the Do Until “Hell” = “Freezes Over” one.

Error Handling via an Error Class

A while ago I read an answer on stackoverflow about error handling. I can’t seem to find that question now, so you’ll have to take my word for it. The question was asking about error handling best practices in VBA and I found one of the answers interesting. The answerer said that you could use the Terminate event of a custom class module as your error logger. I had never thought of that.

I’ve been using the Professional Excel Development error handling method with great success for many years. This method controls the error as it moves back up the call stack, either by returning a Boolean to the calling procedure or by rethrowing the error. Without error handling, VBA sends the error back up the call stack automatically until it is handled. So, for instance, if you only handled errors in your entry point procedures, you would still have an error handler. You just wouldn’t have the information about the stack that would be critical to finding out where the error occurred.

The class method of error handling takes advantage of the built-in ability of VBA to pull an error back up the stack. It uses the fact that local variables go out of scope when the procedure is complete, such as when an unhandled error occurs and the cursor is sent to the calling procedure. If you had a local variable pointing to a class, that class’s Terminate event would fire when an unhandled error occurred.

Borrowing the setup from PED2, let’s see how this would work.

EntryPoint calls SubProc1. SubProc1 calls SubProc2. An error occurs in SubProc2. Only EntryPoint has error handling. It uses On Error Goto ErrorHandler to route program flow to the error handling section. Neither SubProc1 nor SubProc2 have any error handling. We’re going to let VBA kick the error back up to the call stack until it gets to EntryPoint.

Each procedure has a local variable that points to an instance of CError. CError is a custom class whose Terminate event I’ll be using to log the error as it moves back up the stack. When the error occurs in SubProc2, the clsError variable in SubProc2 goes out of scope and its Terminate event fires. The error is passed up to SubProc1 by VBA by design. Because there is no error handling in SubProc1, that error causes the instance of clsError in SubProc1 to go out of scope and its Terminate event fires.

Once again, VBA does it’s thing by passing control back up the stack, error in tow. EntryPoint does have error handling, so when program control reaches it, the ErrorHandler section goes into action. Assuming we’re not in debug mode, the first thing to do is terminate clsError by setting it to nothing. By the time we exit this procedure, the built-in Err object will have been reset and we won’t have anything to log. By setting clsError in EntryPoint to Nothing, we get the last entry in our log. After that, the error is displayed and program control is sent back to ErrorExit for any clean up (no clean up in this example, just the Exit Sub).

The log looks like this:

01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.SubProc2, Error 11: Division by zero
01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.SubProc1, Error 11: Division by zero
01 Jan 14 21:40:40 [errorclass2.xlsm]Module1.EntryPoint, Error 11: Division by zero

Of course I made it virtually identical to PED’s log entry.

Instead of putting error handling in all of the downstream procedures, I just put a local variable that will terminate when an error occurs. The class looks like this:

I’ve kept the logging pretty simple for this example. In the Class_Terminate event, I first check to see if Err.Number is zero. This method relies on the fact that the Terminate event will fire when an error occurs. But in reality, the Terminate event will fire when the subprocedure finishes without error too. It fires whenever my local variable goes out of scope and that happens when there’s an error or when the subprocedure completes. We only want to log when an error occurs, so we have to check that.

The logging is a simple Debug.Print statement. To replicate the PED method, that would need to be expanded to write to a log file.

This is a very simple example that I put together to see how this might be setup. There might be some problems with this method that I haven’t encountered. I’m not advocating that you use this method, but I am intrigued by its simplicity. If you have any thoughts on this method of error handling or on error handling in general, leave a comment below.

You can download errorclass2.zip

UnPivot Shootout

Jeff here, again. PivotTables again. Sorry ’bout that.

snb posted a very concise bit of code to unwind crosstabs over at Unpivot by SQL and so I got to wondering how my much longer routine handled in comparison.

My approach used SQL and lots of Union All statements to do the trick. And lots and lots of code. Whereas snb uses arrays to unwind the crosstab, which is fine so long as you don’t run out of worksheet to post the resulting flat-file in. Which is going to be the case 99.999999% of the time. And frankly, crosstabs in the other 0.000001% of cases deserve to be stuck as crosstabs.

At the same time, I thought I’d also test a previous approach of mine that uses the Multiple Consolidation trick that Mike Alexander outlines at Transposing a Dataset with a PivotTable. This approach:

  1. copies the specific contiguous or non-contiguous columns of data that the user want to turn into a flat file to a new sheet.
  2. concatenates all the columns on the left into one column, while putting the pipe character ‘|’ between each field so that later we can split these apart into separate columns again.
  3. creates a pivot table out of this using Excel’s ‘Multiple Consolidation Ranges’ option. Normally this type of pivot table is used for combining data on different sheets, but it has the side benefit of taking horizontal data and providing a vertical extract once you double click on the Grand Total field. This is also known as a ‘Reverse Pivot’.
  4. splits our pipe-delimited column back into seperate columns, using Excel’s Text-to-Column funcionality.

snb’s approach

snbs’ code for a dataset with two non-pivot fields down the left looked like this:

…which I’m sure you’ll all agree falls somewhere on the spectrum between good looking and positivity anorexic. So I put a bit of meat on it’s bones so that it prompts you for ranges and handles any sized cross-tab:

Talk about yo-yo dieting!

Multiple Consolidation Trick approach

And here’s my code that uses the Multiple Consolidation trick:

The SQL appoach is the same as I published here.

And the winner is…

…snb. By a long shot. With the ever-so-slight caveat that you’re crosstabs are not so stupidly fat that the resulting flat file exceeds the number of rows in Excel.

Here’s how things stacked up on a 53 Column x 2146 Row crosstab, which gives a 117,738 row flat-file:

Approach Time (M:SS)
snb 0:01
UnPivotByConsolidation 0:04
UnPivotBySQL 0:14

And here’s how things stacked up on a 53 Columns x 19,780 Row crosstab, giving a 1,048,340 row flat-file (i.e. practically the biggest sized crosstab that you can unwind):

Approach Time (M:SS)
snb 0:19
UnPivotByConsolidation 0:42
UnPivotBySQL 2:17

So there you have it. Use snb’s code. Unless you have no choice but to use my longer, slower SQL approach.

Update 26 November 2013
It was remiss of me not to mention the Data Normalizer routine in Doug Glancy’s great yoursumbuddy blog, which is just about as fast as snb’s approach below. Go check it out, and subscribe to Doug’s blog while you’re there if you haven’t already.

If you don’t want the hassle of working out which to use, here’s a routine that uses snb’s if possible, and otherwise uses mine:

Repeating Values in Pivot Tables

Often I’m using a PivotTable to aggregate some data to use elsewhere. I’ll take a PivotTable that looks like this

and make it look like a grid so that I can copy and paste it somewhere else. To do that, I first go to the PivotTable Options – Display tab and change it to Classic PivotTable layout.

Then I’ll go to each PivotItem that’s a row and remove the subtotal

and check the Repeat item labels checkbox.

And I get a PivotTable that’s ready for copying and pasting.

After about 50 times of doing that, I got sick of it. Now I just run this code.

The Encyclopedia of Dictionaries

Frequent commenter snb has written a thorough page on the Scripting.Dictionary object. I resisted Dictionaries for the longest time, but lately I’ve been using them in almost every situation where I would have used a Collection object. Here’s the table of contents:

Contents
– What is a dictionary ?
– What is the purpose of a dictionary ?
– Where does the Dictionary derive from ?
– How to create a Dictionary ?
– How to populate a Dictionary ?
– Add or replace ?
– Keys
– When is a key unique ?
– Create unique keys automatically
– A list of unique elements
– Items
– The number of elements in a Dictionary
– Check the existence of a key / item in the Dictionary
– How to retrieve 1 element from the Dictionary ?
– How to use the array .Keys ?
– How to use the array .Items collection ?
– Change the key of an item
– Copy an item inside the Dictionary
– Remove an item from the Dictionary
– Adapt the contents of an item in the Dictionary
– Remove all items from the Dictionary
– Early binding and late binding
– Examples

If you use Dictionaries, you should bookmark this page. If not, you should start.

Multiple Substitute UDF

Have you ever written this formula?

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(UPPER(A2),"AND","")," INC","")," LLC","")," LTD","")," DBA","")," ",""),".",""),",",""),"&",""),"-",""),"/",""),"'","")

I just did. It gets the job done, but it stinks. Here’s its replacement.

=subst(UPPER(A3),""," AND "," INC"," LLC"," LTD"," DBA"," CO"," ",".",",","&","-","/","'")

That’s a little better (assuming it works). On a side note, I wish Excel had built-in constants for formulas, so the above formula would look like this.

=subst(UPPER(A3),xlNULLSTRING," AND "," INC"," LLC"," LTD"," DBA"," CO",xlSPACE,".",xlCOMMA,"&","-","/",xlSINGLEQ)

Maybe I’ll create a Sheet template with those names defined. Or is that better in a Book template? Anyway, here’s the code for the poorly named Subst function.

The ParamArray argument takes as many arguments as you want to throw at it. For some reason, I couldn’t pass OldText by reference to the sorting procedure, so I had to copy it to another variable first. I sort the terms by length so that “corporation” gets replace before “corp”. Otherwise, I’ll be left with “oration”, which is just silly.

Once sorted, I simply replace all of the old with the one new, and return the string. It worked well for the one application I’ve used it for and it was a heck of a lot easier to update. Thoughts?

A new tool: Trusted Document Manager

Hi everyone!

I have just published a new tool today, Trusted Document Manager. This little tool enables you to manage your list of trusted documents. Currently, Excel only allows you to either leave the list intact, or delete the entire list. This means all of your currently trusted documents become untrusted again so you have to enable macro’s on all of them once again. The tool allows you to remove just one file, remove an entire folder or even an entire drive. Also it offers to possibility to remove files which no longer exist from the list.

This is what the tool looks like:

ScreenshotOfTrustedDocManager

Enjoy!

Regards,

Jan Karel Pieterse
www.jkp-ads.com