Identifying duplicates between multiple lists

Howdy folks. Jeff here, back from my summer holiday in the Coromandel Peninsula in the North Island of New Zealand, where I’ve been staring at this for the last 21 days:
DDOE_Identifying duplicates between lists_Opoutere

For the next 344 I’ll be staring at this:
DDOE_Identifying duplicates between lists_Excel
God, it’s good to be home.

A while back I answered this thread for someone wanting to identify any duplicate values found between 4 separate lists.

The way I understood the question, if something appears in each of the four lists, the Op wanted to know about it. If an item just appeared in 3 lists but not all 4, then they didn’t want it to be picked up. And the lists themselves might have duplicates within each list.

Say we’ve got these 4 lists:
DDOE_Identifying duplicates between lists_Names

We can’t simply use Conditional Formatting, because that will include duplicate names that don’t appear in each and every column, such as ‘Mike’:
DDOE_Identifying duplicates between lists_Wrong

Rather, we only want names that appear in every column:
DDOE_Identifying duplicates between lists_Right

I wrote a routine that handled any number of lists, using two dictionaries and a bit of shuffling between them. And the routine allows users to select either a contiguous range if their lists are all in one block, or multiple non-contiguous ranges if they aren’t.

  1. The user gets prompted for the range where they want the identified duplicates to appear:
    DDOE_Identifying duplicates between lists_Select Output Range
  2.  

  3. Then they get prompted to select the first list. The items within that list get added to Dic_A. (If they select more than one columns, the following steps get executed automatically).
    DDOE_Identifying duplicates between lists_Select First Range
  4.  

  5. Next they get prompted to select the 2nd list, at which point the code attempts to add each new item to Dic_A. If an item already exists in Dic_A then we know it’s a duplicate between lists, and so we add it to Dic_B. At the end of this, we clear Dic_A. Notice that any reference to selecting a contiguous range has been dropped from the InputBox:
    DDOE_Identifying duplicates between lists_Select 2nd range
  6.  

  7. When they select the 3rd list, then it attempts to add each new item to Dic_B, and if an error occurs, then we know it’s a duplicate between lists, and so we add it to Dic_A. At the end of this, we clear Dic_B. We carry on in this manner until the user pushes Cancel (and notice now that the InputBox message tells them to push cancel when they’re done):
    DDOE_Identifying duplicates between lists_Select 3rd range
  8.  

Pretty simple: just one input box, an intentional infinite loop, and two dictionaries that take turns holding the current list of dictionaries. Hours of fun.

Only problem is, I had forgotten to account for the fact that there might be duplicates within a list. The old code would have misinterpreted these duplicates as between-list duplicates, rather than within-list duplicates. The Op is probably completely unaware, and probably regularly bets the entire future of his country’s economy based on my bad code. Oops.

I’ve subsequently added another step where a 3rd dictionary is used to dedup the items in the list currently being processed. Here’s the revised code. My favorite line is the Do Until “Hell” = “Freezes Over” one.

A date with PivotItems

Howdy, folks. Jeff here again. I’ve been sharpening up some code to manually filter a PivotField based on an external list in a range outside the PivotTable. It works blazingly fast. Unless there’s dates as well as non-dates in my PivotField. In which case it errors out.

Try this:
Put “Pivot” in A1
Put =TODAY() in A2
Put =VALUE(TODAY()) in A3

Now make a PivotTable out of that data.

Pivot_DDOE

Now put this code into the VBE and step through it:

If the same thing happens to you as happens to me, you will either be speaking aloud the title of this routine, or you will be speaking aloud this:
Unable to get the PivotItems property of the PivotField class.

Go type these in the immediate pane:

What the…?

Now try these:

So it seems can’t do certain stuff to a PivotItem if that PivotItem is a date but your PivotField number format is set to General.

That’s weird.

Equally weird, try this:
Select the PivotTable, and record a macro as you change it’s number format back to General.

What the …? Change the PivotField Number Format, and you get a macro that tells you that you changed the PivotField name!

So what happens if you run that macro? Well, it changes the name of the PivotField:
PivotField_20131110

It does nothing to the number format.

Strangely enough, I found some info on this problem at one of my most revisited blogposts that I had somehow missed: Jon Peltier’s Referencing Pivot Table Ranges in VBA

Stranger still, the answer was by Jon Peltier back in 2009 in relation to a question asked by….wait for it…me. Don’t know how I missed that. Must have been sleeping.
So I’ve come across this problem before, found an answer, completely forgotten about it, and then wasted 2 days trying to work out what the problem was, purely so I could Google my own answered question.

I’m going to read through all 238 ( and counting) comments in that thread and see what else Jon has told me over the years I’ve been learning VBA.
There’s also something on this at stackoverflow

Jeff

–edit–
Jon’s method was to loop through the pivot items, and compare the pivot item caption to what he was looking for:

But now I know it’s probably easier just to change the format of the PivotField.

—edit 30 May 2014 —
Prompted by Jerry Sullivan’s comment I found that this was only an issue for non-US regional settings, and that this issue is now fixed in Excel 2013. (Thank you, Microsoft. But why the heck didn’t you tell me you’d fixed it?)

According to IronyAaron in the comments at this thread:

When VBA recognizes the dates in the pivot cache, it reads the US version after parsing although the item is read as a locally formatted string. This therefore causes VBA to fail when recognizing Date variables.

Bummer! So write some code that filters PivotItems, and you might find that non-US users have issues, unless they change their regional settings in Windows to US beforehand. Good luck with that.

— edit over —

Excel isn’t fully cooked.

It’s still raw in the middle, a bit.

  • PivotTables are great, but why can’t I filter one based on an external range?
  • Shape Styles would be cool, if I could add my own styles.
  • Slicers are great, but why isn’t there an easy way to instantly get an overview of what Slicers are connected to what PivotTables, and at the same time use the same interface to effortlessly manage this and set up exclusions?
  • Charts are great, but why do I have to manually cut and paste them many times in order to make up the equivalent of a Small Multiple chart?

Fortunately, in most cases Excel is pretty programmable. If it doesn’t do something out of the box – or if using the built-in functionality is tedious – then given the requisite skills or the right Google search then you can almost always work around such bottlenecks fairly efficiently.

Unfortunately, Excel isn’t fully programmable no matter what skills you have. The Partial Text Search box that you can use to manually filter PivotFields on partial text pretty much instantly is awesome. But you can’t address it via VBA…it’s for humans only. Which means you’ve got to iterate through potentially large PivotItems collections s-l-o-w-l-y in maybe minutes in order to do something that users can do in seconds. (I won’t mention VBA support for PowerPivot as another example, because PowerPivot is still fairly new. Rough rule of thumb: Don’t complain about something until it’s been sitting around at least 5 years in an unfinished state. At least, that’s what I tell my wife).

I’ve got some great code to handle the Filtering Pivots and the Slicers things. And I can code up a cut and paste thing for Charts pretty easily. But that Partial Text Search box I can’t help you with. Maybe this will help get it on someone’s radar, but obviously including such ‘niceties’ in the in-box functionality isn’t trivial. Otherwise the dialog box for MSQuery (which I still use) would let me expand it. And the Name Manager would let me do the things that the better Name Manager will do. And so on.

So presumably these things are not trivial to enhance, or they would be enhanced already. But here’s the thing: by comparison, my code for filtering pivots or managing Slicers is trivial to roll out to users compared to the effort required to add this functionality natively.

So question: Why don’t MS supplement their great unfinished app by building and offering to users useful workarounds in one of the most agile-ready platforms there is…VBA? Why aren’t they monitoring forums and blogs for the best and brightest productivity enhancements, and buying the IP from content creators at a song, then offering it to users as add-ins that plug the gap until they get around to finishing Excel?

Regards,
Jeff Weir.
Cosmetic Surgeon.