Adding Functionality – 2b

In Adding Functionality – 2a I laid out the UI changes to support sorting of the results of Dick Kusleika’s VBE Find.

In this post, I develop the actual sort routine.

To recap, the task is to sort a 2D matrix on any number of columns. The sort columns are listed in a 1D array, with each entry in this array identifying one sort column. So, the sort routine’s signature is

Sub doSort(DataArr(), SortCols() As Integer)
    End Sub

The routine uses what is called the Bubble Sort. It is fast for “small” matrices, and will be more than adequate for our needs.

The Bubble Sort essentially goes through the matrix row by row. For each row, it compares that row with every other row between itself and the end of the matrix. If necessary, it swaps the two rows. Consequently, once we are done with a particular row, we never need to reexamine it again. And, that’s why it’s called a Bubble Sort. The results “bubble” to their correct spot.

Sub doSort(DataArr(), SortCols() As Integer)
    If ArrLen(DataArr) = 0 Or ArrLen(SortCols) = 0 Then Exit Sub
    ‘Swap the rows of DataArr as needed to perform an ascending sort
   Dim I As Long, J As Long
    For I = LBound(DataArr) To UBound(DataArr) – 1
        For J = I + 1 To UBound(DataArr)
            if row_I_key > row_J_key then swapRow DataArr, I, J
            Next J
        Next I
    End Sub

That’s it. At its core, that’s the Bubble Sort. If we had a single key to sort on, we’d use the test DataArr(I,key-column) > DataArr (J, key-column). However, in this case, we have to cater to possibly multiple keys in the SortCols array. So, we write a slightly more complex comparison routine — and in the process also convert a number provided as a string to a numeric value.

To process multiple keys, we start with the ‘outermost’ key. If we can decide on the relative positions of the 2 records, we don’t have to check the next key. However, if the 2 key values are equal, we check the next key.

Option Explicit
Option Base 0
Option Compare Text

Function ArrLen(Arr, Optional whatDim As Integer = 1)
    ArrLen = UBound(Arr, whatDim) – LBound(Arr, whatDim) + 1
    End Function

    Sub swapRow(DataArr, R1 As Long, R2 As Long)
        Dim J As Long
        For J = LBound(DataArr, 2) To UBound(DataArr, 2)
            Dim Temp
            If TypeOf DataArr(R1, J) Is Object  Then
                Set Temp = DataArr(R1, J)
                Set DataArr(R1, J) = DataArr(R2, J)
                Set DataArr(R2, J) = Temp
            Else
                Temp = DataArr(R1, J): DataArr(R1, J) = DataArr(R2, J): DataArr(R2, J) = Temp
                End If
            Next J
        End Sub
Sub doSort(DataArr(), SortCols() As Integer)
    If ArrLen(DataArr) = 0 Or ArrLen(SortCols) = 0 Then Exit Sub
    ‘Swap the rows of DataArr as needed to perform an ascending sort
   Dim I As Long, J As Long
    For I = LBound(DataArr) To UBound(DataArr) – 1
        For J = I + 1 To UBound(DataArr)
            Dim SortColIdx As Integer
            SortColIdx = LBound(SortCols)
            Dim PairDone As Boolean: PairDone = False
            Do
                Dim X1, X2, SortCol As Long
                SortCol = SortCols(SortColIdx)
                X1 = DataArr(I, SortCol): X2 = DataArr(J, SortCol)
                If IsNumeric(X1) And IsNumeric(X2) Then X1 = CDbl(X1): X2 = CDbl(X2)
                If X1 > X2 Then
                    swapRow DataArr, I, J
                    PairDone = True
                ElseIf X1 < X2 Then
                    PairDone = True
                Else
                    SortColIdx = SortColIdx + 1
                    End If
                Loop Until PairDone Or SortColIdx > UBound(SortCols)
            Next J
        Next I
    End Sub

The algorithm I eventually implemented uses an ‘index array’ to track the final sequence of the rows in the data matrix. Use of an index array means that each swap involves just 1 element rather than every column in the data matrix.

Here’s where one would see a benefit. Suppose we have 3 records with the keys 33,22, and 11, respectively. Then, the final, and desired, sequence would be the records with keys 11,22, and 33.

With a Bubble Sort we would get the following swaps: (33,22,11) -> (22,33,11) -> (22,11,33) -> (11,22,33). So, we have to swap the rows 4 times, with each swap requiring an exchange of all the columns in the 2 rows, one at a time.

Alternatively, we could build an array with the record numbers in it, i.e., (1,2,3). Now, we change only the contents of this array leaving the original array alone. So, the four swaps would yield the index array (1,2,3) -> (2,1,3) -> (2,3,1) -> (3,2,1). This gives us the sequence of the final output, i.e., row 3, row 2, and finally row 1. So, we would have 4 swaps with the index array and 3 sequence moves for the data matrix.

Posted in Uncategorized

4 thoughts on “Adding Functionality – 2b

  1. I took a different approach to sort a 2D-array on several ‘fields/columns’

    I have a general sorting routine ‘function sorteer_000’ that sorts a 1-dimensional array.

    If I want to sort a 2D-arrray I transform the 2-D array to a 1-dimensional array using ‘function sorteer_001’.
    In that routine I add the sorting datafields to the beginning of each item.

    Then I pass the resulting 1-dimensional array to the sorting routine ‘function sorteer_000’
    The resulting sorted 1-dimensional array will be retransformed to the original 2_D array.

    Example:
    the first row of the array consists of one|blue|88 (| as fieldseparator)
    if the array has to be sorted on 1. the 3rd field, 2 the first field; I transform this row to :
    88one_one|blue|88
    when the sorting routine is finished the stringpart after the underscore will be put into the 2D_array.

    The start_macro

    Sub tst_001a()
      sr = ActiveWorkbook.Sheets(1).Cells(1).CurrentRegion
      sn = Array(3, 1)

      ActiveWorkbook.Sheets(1).Cells(1).Offset(,5).Resize(UBound(sr), UBound(sr, 2)) = F_sorteer_001(sp, sr)
    End Sub

    The transforming 1D to 2D; 2D to 1D array

    Function F_sorteer_001(sq, sn)
      For J = LBound(sq) To UBound(sq)
        c00 = “”
        For jj = 0 To UBound(sn)
          c00 = c00 & sq(J, sn(jj))
        Next
        For jj = LBound(sq) To UBound(sq, 2)
          c01 = c01 & IIf(jj = LBound(sq), c00 & “_”, “~”) & sq(J, jj) & IIf(jj = UBound(sq, 2), vbCr, “”)
        Next
      Next
       
      sn = F_sorteer_000(Filter(Split(c01, vbCr), “_”))
       
      For J = 0 To UBound(sn)
        sr = Split(Split(sn(J), “_”)(1), “~”)
        For jj = 0 To UBound(sr)
          sq(J + LBound(sq), jj + LBound(sq)) = sr(jj)
        Next
      Next

      F_sorteer_001 = sq
    End Function

    The sorting function

    Function F_sorteer_000(sq)
      sn = Split(sq(IIf(sq(0) < sq(1), 0, 1)) & “|” & sq(IIf(sq(0) < sq(1), 1, 0)), “|”)

      For J = 2 To UBound(sq)
        If Not (sq(J) < sn(0) Or sq(J) > sn(UBound(sn))) Then
          y = 0
          If UBound(sn) > 10 Then y = IIf(sq(J) > sn(UBound(sn) 2 + 1), UBound(sn) 2, 0)
          For jj = y To UBound(sn) – 1
            If (sq(J) > sn(jj) And sq(J) < sn(jj + 1)) Then
              sn(jj) = sn(jj) & “|” & sq(J)
              Exit For
            End If
          Next
        End If

        sn = Split(IIf(sq(J) < sn(0), sq(J) & “|”, “”) & Join(sn, “|”) & IIf(sq(J) > sn(UBound(sn)), “|” & sq(J), “”), “|”)
      Next

      F_sorteer_000 = sn
    End Function

    As far as I can see this sorting method needs much less loops than the ‘bubble-sort’ method.

  2. 1) You have an interesting way of writing compact code, Hans. Compared to more obvious ways of doing the same, whether it is more understandable is debatable. As far as machine-efficiency goes, I doubt the compact style is better. Take this simple example:

    sn = Split(sq(IIf(sq(0) < sq(1), 0, 1)) & “|” & sq(IIf(sq(0) < sq(1), 1, 0)), “|”)

    It requires 6 array index operations, 1 string comparison in each of 2 IIf functions, 3 concatenations, and 1 string-to-array conversion creating an array in a variant. An optimizing compiler might reduce this to 4 array index operations, 1 string comparison with the result stored in a temporary variable, 2 IIf functions using this temporary variable, 3 concatenations, and 1 string-to-array conversion creating an array in a variant. [I might be wrong but I don’t see any way the compiler could do any better.]

    By contrast, the more obvious and IMO eminently more readable

    if sq(0)<sq(1) then sn(0)=sq(0):sn(1)=sq(1) else sn(0)=sq(1):sn(1)=sq(0)

    requires 4 array index operations, 1 string comparison, and 2 array indexed assignments. An optimizing compiler will do even better! It also has a lot lower probability of having a bug in the code.

    2) Sorting on concatenated fields of variable length values of variable data types is *guaranteed* to fail.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.