Remove Duplicats Duplicates From Two Dimension Arrays

Posted on January 16, 2006 by Dick Kusleika

Earlier I posted my subs for removing duplicates from single dimension arrays. I’ve attempted to modify that code to handle two dimension arrays. I also added code to remove empty arrays from the primary array. That is, two secondary arrays contained exactly the same elements. That probably should have been in the original code too.

There are limitations to this code, such as it assumes the second dimension is the one to do an element-number-based sort on, which should be generally true as the second dimension is the only one you can change in a dynamic array. It also assumes zero based arrays in some cases, which could be changed with a little effort. I decided recently to never have one-based arrays simply because the Split function returns zero-based.

Sub StartArraysM()

Dim vaOne As Variant
Dim vaTwo As Variant
Dim vaThree As Variant
Dim vaFour As Variant
Dim vaMain As Variant

‘Set up some secondary arrays
ReDim vaOne(0 To 2, 0 To 4)
vaOne(0, 0) = 1: vaOne(1, 0) = “One”: vaOne(2, 0) = “zOne”
vaOne(0, 1) = 2: vaOne(1, 1) = “Two”: vaOne(2, 1) = “zTwo”
vaOne(0, 2) = 3: vaOne(1, 2) = “Three”: vaOne(2, 2) = “zThree”
vaOne(0, 3) = 4: vaOne(1, 3) = “Four”: vaOne(2, 3) = “zFour”
vaOne(0, 4) = 5: vaOne(1, 4) = “Five”: vaOne(2, 4) = “zFive”

ReDim vaTwo(0 To 1, 0 To 2)
vaTwo(0, 0) = 5: vaTwo(1, 0) = “Five”
vaTwo(0, 1) = 6: vaTwo(1, 1) = “Six”
vaTwo(0, 2) = 7: vaTwo(1, 2) = “Seven”

ReDim vaThree(0 To 1, 0 To 3)
vaThree(0, 0) = 1: vaThree(1, 0) = “One”
vaThree(0, 1) = 5: vaThree(1, 1) = “Five”
vaThree(0, 2) = 8: vaThree(1, 2) = “Eight”
vaThree(0, 3) = 9: vaThree(1, 3) = “Nine”

ReDim vaFour(0 To 1, 0 To 2)
vaFour(0, 0) = 5: vaFour(1, 0) = “Five”
vaFour(0, 1) = 6: vaFour(1, 1) = “Six”
vaFour(0, 2) = 7: vaFour(1, 2) = “Seven”

‘load the primary array with the secondary ones
vaMain = Array(vaOne, vaTwo, vaThree, vaFour)

‘pass the primary array to remove the dupes
‘and display the results
RemoveDuplicatesM vaMain, 1
ShowResultsM vaMain

End Sub

Sub RemoveDuplicatesM(ByRef vaMain As Variant, ByVal lElem As Long)

‘Assumes second dimension is the largest
‘vaMain is an array whose elements are arrays
‘lElem is which element in the first dimension where duplicates must be found

Dim i As Long, j As Long, k As Long, idx As Long
Dim vItm As Variant
Dim cDupes As Collection
Dim aTemp() As Variant
Dim vaSort As Variant
Dim vCheckElem As Variant

‘check for array arguments
If Not IsArray(vaMain) Then Exit Sub

For i = LBound(vaMain) To UBound(vaMain)
If Not IsArray(vaMain(i)) Then Exit Sub
Next i

‘Make sure lElem is valid for all secondary arrays
On Error Resume Next
For i = LBound(vaMain) To UBound(vaMain)
vCheckElem = vaMain(i)(lElem, 0)
If Err.Number > 0 Then
Exit Sub
End If
Next i
On Error GoTo 0

‘Sort arrays by number of elements so as to remove duplicates from
‘the most populous arrays
For i = LBound(vaMain) To UBound(vaMain) – 1
For j = i + 1 To UBound(vaMain)
If UBound(vaMain(i), 2) > UBound(vaMain(j), 2) Then
vaSort = vaMain(i)
vaMain(i) = vaMain(j)
vaMain(j) = vaSort
End If
Next j
Next i

‘blank out array elements that are duplicates. Later
‘I’ll test the length of the element to remove those
‘elements that were duplicates and are now empty strings
Set cDupes = New Collection

For i = LBound(vaMain) To UBound(vaMain)
For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
On Error Resume Next
cDupes.Add vaMain(i)(lElem, j), CStr(vaMain(i)(lElem, j))

If Err.Number > 0 Then
vaMain(i)(lElem, j) = “”
Err.Clear
End If
On Error GoTo 0
Next j
Next i

‘put non-blank elements in a temp array and reassign
‘the temporary array back to the secondary array
For i = LBound(vaMain) To UBound(vaMain)

‘reinitialize variables for each secondary array
ReDim aTemp(0 To UBound(vaMain(i), 1), 0 To 0)
k = 0
For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)

‘skip elements that are empty strings
If Len(vaMain(i)(lElem, j)) > 0 Then

‘assign non-empty strings to the temporary array and
‘increment the counter
ReDim Preserve aTemp(0 To UBound(vaMain(i), 1), 0 To k)
For idx = 0 To UBound(vaMain(i), 1)
aTemp(idx, k) = vaMain(i)(idx, j)
Next idx
k = k + 1
End If
Next j

‘clear out the secondary array once the temp array is filled
Erase vaMain(i)

‘assign the temporary array back to the recently-erased secondary array
vaMain(i) = aTemp
Next i

‘Remove an empty arrays from the primary array
ReDim aTemp(0 To 0)
k = 0

For i = LBound(vaMain) To UBound(vaMain)
If Not IsEmpty(vaMain(i)(0, 0)) Then
ReDim Preserve aTemp(0 To k)
aTemp(k) = vaMain(i)
k = k + 1
End If
Next i

Erase vaMain

vaMain = aTemp

End Sub

Sub ShowResultsM(ByRef vaMain As Variant)

Dim i As Long, j As Long, k As Long
Dim sRes As String

For i = LBound(vaMain) To UBound(vaMain)
For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
sRes = i & vbTab
For k = LBound(vaMain(i), 1) To UBound(vaMain(i), 1)
sRes = sRes & vaMain(i)(k, j) & vbTab
Next k
Debug.Print sRes
Next j
Debug.Print “—————“
Next i

End Sub

12 thoughts on “Remove Duplicats Duplicates From Two Dimension Arrays”

Macro Man says:

January 16, 2006 at 12:48 pm

wow.
Alex J says:

January 16, 2006 at 4:42 pm

Dick,
I think using collections (or even dictionaries) is exactly the right way to go. As an extention, had you thought of using collections of collections (rather than arrays of arrays) to manage the multi-level approach? Then you don’t have to dimension so many things.
Gary Waters says:

January 16, 2006 at 5:06 pm

Dick,

You can slightly improve your RemoveDuplicatesM subroutine. Instead of forcefully blanking out duplicates and then rebuild the array by checking if they are blanked out, you can combine the 2 as follows:

‘Skip array elements that are duplicates.
‘Add non-duplicates to temp array to rebuild vaMain(i)

Set cDupes = New Collection

For i = LBound(vaMain) To UBound(vaMain)

‘reinitialize variables for each secondary array
ReDim aTemp(0 To UBound(vaMain(i), 1), 0 To 0)
k = 0

For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
On Error Resume Next
cDupes.Add vaMain(i)(lElem, j), CStr(vaMain(i)(lElem, j))

If Err.Number > 0 Then ‘ Skip Duplicate
Err.Clear
Else
ReDim Preserve aTemp(0 To UBound(vaMain(i), 1), 0 To k)
For idx = 0 To UBound(vaMain(i), 1)
aTemp(idx, k) = vaMain(i)(idx, j)
Next idx
k = k + 1
End If
On Error GoTo 0
Next j

‘clear out the secondary array once the temp array is filled
Erase vaMain(i)

‘assign the temporary array back to the recently-erased secondary array
vaMain(i) = aTemp

Next i
Dick Kusleika says:

January 16, 2006 at 10:48 pm

Thanks Gary. It seems so simple now.

Alex: I thought about that, but I couldn’t figure how to keep the dimensions together.
jkpieterse says:

January 16, 2006 at 11:17 pm

Dick: I haven’t got the code, but I guess you could have created an ADO recordset and use SQL to extract the unique records?
Jamie Collins says:

January 17, 2006 at 6:06 am

JKP: You cannot query an ADO recordset using SQL. You could, however, use ADO to create an mdb file (a.k.a. Access database) in the current user’s temp folder, use SQL DDL (data definition language) to create tables, use procedural code to generate an INSERT INTO for each row of data, then use a query to get the desired resultset into an recordset, then use procedural code (or possibly the recordset’s GetRows method) to convert back to an in-memory array.

This subject is a great demonstration of the difference between a declarative language such as SQL and a procedural language such as VBA. In SQL DML the one line of code required to so this would read more like a spec; the hard work would be done by the SQL’s optimizer. To do the same in VBA, you have to take on the hard work and write the implementation code yourself.

Out of interest, what’s the reason for using in-memory arrays for this one, Dick? The old Everest (‘Because it’s there’) excuse? :)

Jamie
Dick Kusleika says:

January 17, 2006 at 4:28 pm

Re Everest: No it’s a real life scenario. This SQL solution has me thinking again, although I’m pretty sure I thought this through. I’ll try to lay it out without revealing any confidential information.

Three field tabled with fields UnID, CarID, CarDesc. The user will select from 1 to 4 CarIDs from four comboboxes (unused cbs will remain blank). I need to produce a string that is the concatentation of CarDescs that match what the user selected.

The limitations are: There can’t be more than eleven lines of text in the final string (defined by number of chars in CarDesc); There can’t be more than two CarDescs from any one CarID; the CarDescs used have to be roughly equally distribution among the CarIDs selected (I can’t have 1 CarDesc from a CarID and 4 from another); the CarDescs used from a CarID have to be randomly selected from all the CarDescs for that CarID.

Once I’ve passed through that I need to fill up to get to the 11 lines (if necessary) with CarDescs from CarID=General. Finally, there can’t be any duplicate CarDescs in the final string, but there are definitely duplicates in the whole table, such as CarID=1 has “made in USA” and CarID has “made in USA”. (Don’t tell me to remove those duplicates, because I’m not really dealing with cars and I can’t remove them.)

Now I could put the corpus of CarDescs that match CarIDs into a recordset using In() and use DISTINCT to avoid duplication, but I see two problems with this: First, I need to remove duplicates from the CarID with the most records so that I can maintain that ‘roughly equivalent’ limitation. Second, I need to split them back out by CarID so I can sort them randomly and select two.

I have no doubt that this can be done with SQL, but it seems to be well beyond my abilities. I will, however, accept any help offered and will even dummy up an example.
Jamie Collins says:

January 18, 2006 at 2:10 am

Dick,
Hmm, to me this doesn’t sound suitable for SQL (‘sort them randomly’, ‘roughly equivalent’, etc), which was kinda my earlier point. For me, SQL is sometimes Everest where, for fun, I think, ‘How far up Everest can I push this thing?’

Jamie.
jkpieterse says:

January 18, 2006 at 1:10 pm

Jamie,

OK, no SQL against an ADO recordset. Didn’t know that.

Dick: This does sounds more complicated that the post itself shows. Forget SQL for this one :-)
Jamie Collins says:

January 19, 2006 at 7:10 am

What kind of beasts are Duplicats, anyhow? ;)
Dick Kusleika says:

January 19, 2006 at 8:19 am

Schizophrenic felines?
Per Arnander says:

January 19, 2006 at 8:21 am

I have tested this on a very great array (30 k posts) and found that is better do remove duplicates first, since this subroutine goes very fast. And, as Gary Waters says, remove the blanks in the same subroutine. Then you can use the sort subroutine to sort what is left.