Remove Duplicats Duplicates From Two Dimension Arrays

Earlier I posted my subs for removing duplicates from single dimension arrays. I’ve attempted to modify that code to handle two dimension arrays. I also added code to remove empty arrays from the primary array. That is, two secondary arrays contained exactly the same elements. That probably should have been in the original code too.

There are limitations to this code, such as it assumes the second dimension is the one to do an element-number-based sort on, which should be generally true as the second dimension is the only one you can change in a dynamic array. It also assumes zero based arrays in some cases, which could be changed with a little effort. I decided recently to never have one-based arrays simply because the Split function returns zero-based.

Sub StartArraysM()
   
    Dim vaOne As Variant
    Dim vaTwo As Variant
    Dim vaThree As Variant
    Dim vaFour As Variant
    Dim vaMain As Variant
   
    ‘Set up some secondary arrays
   ReDim vaOne(0 To 2, 0 To 4)
    vaOne(0, 0) = 1: vaOne(1, 0) = “One”: vaOne(2, 0) = “zOne”
    vaOne(0, 1) = 2: vaOne(1, 1) = “Two”: vaOne(2, 1) = “zTwo”
    vaOne(0, 2) = 3: vaOne(1, 2) = “Three”: vaOne(2, 2) = “zThree”
    vaOne(0, 3) = 4: vaOne(1, 3) = “Four”: vaOne(2, 3) = “zFour”
    vaOne(0, 4) = 5: vaOne(1, 4) = “Five”: vaOne(2, 4) = “zFive”
   
    ReDim vaTwo(0 To 1, 0 To 2)
    vaTwo(0, 0) = 5: vaTwo(1, 0) = “Five”
    vaTwo(0, 1) = 6: vaTwo(1, 1) = “Six”
    vaTwo(0, 2) = 7: vaTwo(1, 2) = “Seven”
   
    ReDim vaThree(0 To 1, 0 To 3)
    vaThree(0, 0) = 1: vaThree(1, 0) = “One”
    vaThree(0, 1) = 5: vaThree(1, 1) = “Five”
    vaThree(0, 2) = 8: vaThree(1, 2) = “Eight”
    vaThree(0, 3) = 9: vaThree(1, 3) = “Nine”
   
    ReDim vaFour(0 To 1, 0 To 2)
    vaFour(0, 0) = 5: vaFour(1, 0) = “Five”
    vaFour(0, 1) = 6: vaFour(1, 1) = “Six”
    vaFour(0, 2) = 7: vaFour(1, 2) = “Seven”
   
    ‘load the primary array with the secondary ones
   vaMain = Array(vaOne, vaTwo, vaThree, vaFour)
   
    ‘pass the primary array to remove the dupes
   ‘and display the results
   RemoveDuplicatesM vaMain, 1
    ShowResultsM vaMain
   
End Sub
 
Sub RemoveDuplicatesM(ByRef vaMain As Variant, ByVal lElem As Long)
   
    ‘Assumes second dimension is the largest
   ‘vaMain is an array whose elements are arrays
   ‘lElem is which element in the first dimension where duplicates must be found
   
    Dim i As Long, j As Long, k As Long, idx As Long
    Dim vItm As Variant
    Dim cDupes As Collection
    Dim aTemp() As Variant
    Dim vaSort As Variant
    Dim vCheckElem As Variant
   
    ‘check for array arguments
   If Not IsArray(vaMain) Then Exit Sub
   
    For i = LBound(vaMain) To UBound(vaMain)
        If Not IsArray(vaMain(i)) Then Exit Sub
    Next i
   
    ‘Make sure lElem is valid for all secondary arrays
   On Error Resume Next
        For i = LBound(vaMain) To UBound(vaMain)
            vCheckElem = vaMain(i)(lElem, 0)
            If Err.Number > 0 Then
                Exit Sub
            End If
        Next i
    On Error GoTo 0
   
    ‘Sort arrays by number of elements so as to remove duplicates from
   ‘the most populous arrays
   For i = LBound(vaMain) To UBound(vaMain) – 1
        For j = i + 1 To UBound(vaMain)
            If UBound(vaMain(i), 2) > UBound(vaMain(j), 2) Then
                vaSort = vaMain(i)
                vaMain(i) = vaMain(j)
                vaMain(j) = vaSort
            End If
        Next j
    Next i
   
    ‘blank out array elements that are duplicates.  Later
   ‘I’ll test the length of the element to remove those
   ‘elements that were duplicates and are now empty strings
   Set cDupes = New Collection
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
            On Error Resume Next
                cDupes.Add vaMain(i)(lElem, j), CStr(vaMain(i)(lElem, j))
               
                If Err.Number > 0 Then
                    vaMain(i)(lElem, j) = “”
                    Err.Clear
                End If
            On Error GoTo 0
        Next j
    Next i
   
    ‘put non-blank elements in a temp array and reassign
   ‘the temporary array back to the secondary array
   For i = LBound(vaMain) To UBound(vaMain)
   
        ‘reinitialize variables for each secondary array
       ReDim aTemp(0 To UBound(vaMain(i), 1), 0 To 0)
        k = 0
        For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
       
            ‘skip elements that are empty strings
           If Len(vaMain(i)(lElem, j)) > 0 Then
           
                ‘assign non-empty strings to the temporary array and
               ‘increment the counter
               ReDim Preserve aTemp(0 To UBound(vaMain(i), 1), 0 To k)
                For idx = 0 To UBound(vaMain(i), 1)
                    aTemp(idx, k) = vaMain(i)(idx, j)
                Next idx
                k = k + 1
            End If
        Next j
       
        ‘clear out the secondary array once the temp array is filled
       Erase vaMain(i)
       
        ‘assign the temporary array back to the recently-erased secondary array
       vaMain(i) = aTemp
    Next i
       
    ‘Remove an empty arrays from the primary array
   ReDim aTemp(0 To 0)
    k = 0
   
    For i = LBound(vaMain) To UBound(vaMain)
        If Not IsEmpty(vaMain(i)(0, 0)) Then
            ReDim Preserve aTemp(0 To k)
            aTemp(k) = vaMain(i)
            k = k + 1
        End If
    Next i
   
    Erase vaMain
   
    vaMain = aTemp
   
End Sub
 
Sub ShowResultsM(ByRef vaMain As Variant)
   
    Dim i As Long, j As Long, k As Long
    Dim sRes As String
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
            sRes = i & vbTab
            For k = LBound(vaMain(i), 1) To UBound(vaMain(i), 1)
                sRes = sRes & vaMain(i)(k, j) & vbTab
            Next k
            Debug.Print sRes
        Next j
        Debug.Print “—————“
    Next i
   
End Sub
Posted in Uncategorized

12 thoughts on “Remove Duplicats Duplicates From Two Dimension Arrays

  1. Dick,
    I think using collections (or even dictionaries) is exactly the right way to go. As an extention, had you thought of using collections of collections (rather than arrays of arrays) to manage the multi-level approach? Then you don’t have to dimension so many things.

  2. Dick,

    You can slightly improve your RemoveDuplicatesM subroutine. Instead of forcefully blanking out duplicates and then rebuild the array by checking if they are blanked out, you can combine the 2 as follows:

    ‘Skip array elements that are duplicates.
    ‘Add non-duplicates to temp array to rebuild vaMain(i)

    Set cDupes = New Collection

    For i = LBound(vaMain) To UBound(vaMain)

    ‘reinitialize variables for each secondary array
    ReDim aTemp(0 To UBound(vaMain(i), 1), 0 To 0)
    k = 0

    For j = LBound(vaMain(i), 2) To UBound(vaMain(i), 2)
    On Error Resume Next
    cDupes.Add vaMain(i)(lElem, j), CStr(vaMain(i)(lElem, j))

    If Err.Number > 0 Then ‘ Skip Duplicate
    Err.Clear
    Else
    ReDim Preserve aTemp(0 To UBound(vaMain(i), 1), 0 To k)
    For idx = 0 To UBound(vaMain(i), 1)
    aTemp(idx, k) = vaMain(i)(idx, j)
    Next idx
    k = k + 1
    End If
    On Error GoTo 0
    Next j

    ‘clear out the secondary array once the temp array is filled
    Erase vaMain(i)

    ‘assign the temporary array back to the recently-erased secondary array
    vaMain(i) = aTemp

    Next i

  3. JKP: You cannot query an ADO recordset using SQL. You could, however, use ADO to create an mdb file (a.k.a. Access database) in the current user’s temp folder, use SQL DDL (data definition language) to create tables, use procedural code to generate an INSERT INTO for each row of data, then use a query to get the desired resultset into an recordset, then use procedural code (or possibly the recordset’s GetRows method) to convert back to an in-memory array.

    This subject is a great demonstration of the difference between a declarative language such as SQL and a procedural language such as VBA. In SQL DML the one line of code required to so this would read more like a spec; the hard work would be done by the SQL’s optimizer. To do the same in VBA, you have to take on the hard work and write the implementation code yourself.

    Out of interest, what’s the reason for using in-memory arrays for this one, Dick? The old Everest (‘Because it’s there’) excuse? :)

    Jamie

    –

  4. Re Everest: No it’s a real life scenario. This SQL solution has me thinking again, although I’m pretty sure I thought this through. I’ll try to lay it out without revealing any confidential information.

    Three field tabled with fields UnID, CarID, CarDesc. The user will select from 1 to 4 CarIDs from four comboboxes (unused cbs will remain blank). I need to produce a string that is the concatentation of CarDescs that match what the user selected.

    The limitations are: There can’t be more than eleven lines of text in the final string (defined by number of chars in CarDesc); There can’t be more than two CarDescs from any one CarID; the CarDescs used have to be roughly equally distribution among the CarIDs selected (I can’t have 1 CarDesc from a CarID and 4 from another); the CarDescs used from a CarID have to be randomly selected from all the CarDescs for that CarID.

    Once I’ve passed through that I need to fill up to get to the 11 lines (if necessary) with CarDescs from CarID=General. Finally, there can’t be any duplicate CarDescs in the final string, but there are definitely duplicates in the whole table, such as CarID=1 has “made in USA” and CarID has “made in USA”. (Don’t tell me to remove those duplicates, because I’m not really dealing with cars and I can’t remove them.)

    Now I could put the corpus of CarDescs that match CarIDs into a recordset using In() and use DISTINCT to avoid duplication, but I see two problems with this: First, I need to remove duplicates from the CarID with the most records so that I can maintain that ‘roughly equivalent’ limitation. Second, I need to split them back out by CarID so I can sort them randomly and select two.

    I have no doubt that this can be done with SQL, but it seems to be well beyond my abilities. I will, however, accept any help offered and will even dummy up an example.

  5. Dick,
    Hmm, to me this doesn’t sound suitable for SQL (‘sort them randomly’, ‘roughly equivalent’, etc), which was kinda my earlier point. For me, SQL is sometimes Everest where, for fun, I think, ‘How far up Everest can I push this thing?’

    Jamie.

    –

  6. Jamie,

    OK, no SQL against an ADO recordset. Didn’t know that.

    Dick: This does sounds more complicated that the post itself shows. Forget SQL for this one :-)

  7. I have tested this on a very great array (30 k posts) and found that is better do remove duplicates first, since this subroutine goes very fast. And, as Gary Waters says, remove the blanks in the same subroutine. Then you can use the sort subroutine to sort what is left.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.