Adding Functionality – 2b

Posted on February 12, 2011 by Tushar Mehta

In Adding Functionality – 2a I laid out the UI changes to support sorting of the results of Dick Kusleika’s VBE Find.

In this post, I develop the actual sort routine.

To recap, the task is to sort a 2D matrix on any number of columns. The sort columns are listed in a 1D array, with each entry in this array identifying one sort column. So, the sort routine’s signature is

Sub doSort(DataArr(), SortCols() As Integer)
End Sub

The routine uses what is called the Bubble Sort. It is fast for “small” matrices, and will be more than adequate for our needs.

The Bubble Sort essentially goes through the matrix row by row. For each row, it compares that row with every other row between itself and the end of the matrix. If necessary, it swaps the two rows. Consequently, once we are done with a particular row, we never need to reexamine it again. And, that’s why it’s called a Bubble Sort. The results “bubble” to their correct spot.

Sub doSort(DataArr(), SortCols() As Integer)
If ArrLen(DataArr) = 0 Or ArrLen(SortCols) = 0 Then Exit Sub
‘Swap the rows of DataArr as needed to perform an ascending sort
Dim I As Long, J As Long
For I = LBound(DataArr) To UBound(DataArr) – 1
For J = I + 1 To UBound(DataArr)
if row_I_key > row_J_key then swapRow DataArr, I, J
Next J
Next I
End Sub

That’s it. At its core, that’s the Bubble Sort. If we had a single key to sort on, we’d use the test DataArr(I,key-column) > DataArr (J, key-column). However, in this case, we have to cater to possibly multiple keys in the SortCols array. So, we write a slightly more complex comparison routine — and in the process also convert a number provided as a string to a numeric value.

To process multiple keys, we start with the ‘outermost’ key. If we can decide on the relative positions of the 2 records, we don’t have to check the next key. However, if the 2 key values are equal, we check the next key.

Option Explicit
Option Base 0
Option Compare Text

Function ArrLen(Arr, Optional whatDim As Integer = 1)
ArrLen = UBound(Arr, whatDim) – LBound(Arr, whatDim) + 1
End Function

Sub swapRow(DataArr, R1 As Long, R2 As Long)
Dim J As Long
For J = LBound(DataArr, 2) To UBound(DataArr, 2)
Dim Temp
If TypeOf DataArr(R1, J) Is Object Then
Set Temp = DataArr(R1, J)
Set DataArr(R1, J) = DataArr(R2, J)
Set DataArr(R2, J) = Temp
Else
Temp = DataArr(R1, J): DataArr(R1, J) = DataArr(R2, J): DataArr(R2, J) = Temp
End If
Next J
End Sub
Sub doSort(DataArr(), SortCols() As Integer)
If ArrLen(DataArr) = 0 Or ArrLen(SortCols) = 0 Then Exit Sub
‘Swap the rows of DataArr as needed to perform an ascending sort
Dim I As Long, J As Long
For I = LBound(DataArr) To UBound(DataArr) – 1
For J = I + 1 To UBound(DataArr)
Dim SortColIdx As Integer
SortColIdx = LBound(SortCols)
Dim PairDone As Boolean: PairDone = False
Do
Dim X1, X2, SortCol As Long
SortCol = SortCols(SortColIdx)
X1 = DataArr(I, SortCol): X2 = DataArr(J, SortCol)
If IsNumeric(X1) And IsNumeric(X2) Then X1 = CDbl(X1): X2 = CDbl(X2)
If X1 > X2 Then
swapRow DataArr, I, J
PairDone = True
ElseIf X1 < X2 Then
PairDone = True
Else
SortColIdx = SortColIdx + 1
End If
Loop Until PairDone Or SortColIdx > UBound(SortCols)
Next J
Next I
End Sub

The algorithm I eventually implemented uses an ‘index array’ to track the final sequence of the rows in the data matrix. Use of an index array means that each swap involves just 1 element rather than every column in the data matrix.

Here’s where one would see a benefit. Suppose we have 3 records with the keys 33,22, and 11, respectively. Then, the final, and desired, sequence would be the records with keys 11,22, and 33.

With a Bubble Sort we would get the following swaps: (33,22,11) -> (22,33,11) -> (22,11,33) -> (11,22,33). So, we have to swap the rows 4 times, with each swap requiring an exchange of all the columns in the 2 rows, one at a time.

Alternatively, we could build an array with the record numbers in it, i.e., (1,2,3). Now, we change only the contents of this array leaving the original array alone. So, the four swaps would yield the index array (1,2,3) -> (2,1,3) -> (2,3,1) -> (3,2,1). This gives us the sequence of the final output, i.e., row 3, row 2, and finally row 1. So, we would have 4 swaps with the index array and 3 sequence moves for the data matrix.

4 thoughts on “Adding Functionality – 2b”

Tweets that mention Daily Dose of Excel » Blog Archive » Adding Functionality - 2b -- Topsy.com says:

February 12, 2011 at 9:02 pm

[…] This post was mentioned on Twitter by gorocube, Excel MVP. Excel MVP said: Adding Functionality – 2b http://goo.gl/fb/0ZnnB […]
hans schraven says:

February 18, 2011 at 7:43 am

I took a different approach to sort a 2D-array on several ‘fields/columns’

I have a general sorting routine ‘function sorteer_000’ that sorts a 1-dimensional array.

If I want to sort a 2D-arrray I transform the 2-D array to a 1-dimensional array using ‘function sorteer_001’.
In that routine I add the sorting datafields to the beginning of each item.

Then I pass the resulting 1-dimensional array to the sorting routine ‘function sorteer_000’
The resulting sorted 1-dimensional array will be retransformed to the original 2_D array.

Example:
the first row of the array consists of one|blue|88 (| as fieldseparator)
if the array has to be sorted on 1. the 3rd field, 2 the first field; I transform this row to :
88one_one|blue|88
when the sorting routine is finished the stringpart after the underscore will be put into the 2D_array.

The start_macro

Sub tst_001a()
sr = ActiveWorkbook.Sheets(1).Cells(1).CurrentRegion
sn = Array(3, 1)

ActiveWorkbook.Sheets(1).Cells(1).Offset(,5).Resize(UBound(sr), UBound(sr, 2)) = F_sorteer_001(sp, sr)
End Sub

The transforming 1D to 2D; 2D to 1D array

Function F_sorteer_001(sq, sn)
For J = LBound(sq) To UBound(sq)
c00 = “”
For jj = 0 To UBound(sn)
c00 = c00 & sq(J, sn(jj))
Next
For jj = LBound(sq) To UBound(sq, 2)
c01 = c01 & IIf(jj = LBound(sq), c00 & “_”, “~”) & sq(J, jj) & IIf(jj = UBound(sq, 2), vbCr, “”)
Next
Next

sn = F_sorteer_000(Filter(Split(c01, vbCr), “_”))

For J = 0 To UBound(sn)
sr = Split(Split(sn(J), “_”)(1), “~”)
For jj = 0 To UBound(sr)
sq(J + LBound(sq), jj + LBound(sq)) = sr(jj)
Next
Next

F_sorteer_001 = sq
End Function

The sorting function

Function F_sorteer_000(sq)
sn = Split(sq(IIf(sq(0) < sq(1), 0, 1)) & “|” & sq(IIf(sq(0) < sq(1), 1, 0)), “|”)

For J = 2 To UBound(sq)
If Not (sq(J) < sn(0) Or sq(J) > sn(UBound(sn))) Then
y = 0
If UBound(sn) > 10 Then y = IIf(sq(J) > sn(UBound(sn) 2 + 1), UBound(sn) 2, 0)
For jj = y To UBound(sn) – 1
If (sq(J) > sn(jj) And sq(J) < sn(jj + 1)) Then
sn(jj) = sn(jj) & “|” & sq(J)
Exit For
End If
Next
End If

sn = Split(IIf(sq(J) < sn(0), sq(J) & “|”, “”) & Join(sn, “|”) & IIf(sq(J) > sn(UBound(sn)), “|” & sq(J), “”), “|”)
Next

F_sorteer_000 = sn
End Function

As far as I can see this sorting method needs much less loops than the ‘bubble-sort’ method.
Tushar Mehta says:

February 18, 2011 at 5:03 pm

1) You have an interesting way of writing compact code, Hans. Compared to more obvious ways of doing the same, whether it is more understandable is debatable. As far as machine-efficiency goes, I doubt the compact style is better. Take this simple example:

sn = Split(sq(IIf(sq(0) < sq(1), 0, 1)) & “|” & sq(IIf(sq(0) < sq(1), 1, 0)), “|”)

It requires 6 array index operations, 1 string comparison in each of 2 IIf functions, 3 concatenations, and 1 string-to-array conversion creating an array in a variant. An optimizing compiler might reduce this to 4 array index operations, 1 string comparison with the result stored in a temporary variable, 2 IIf functions using this temporary variable, 3 concatenations, and 1 string-to-array conversion creating an array in a variant. [I might be wrong but I don’t see any way the compiler could do any better.]

By contrast, the more obvious and IMO eminently more readable

if sq(0)<sq(1) then sn(0)=sq(0):sn(1)=sq(1) else sn(0)=sq(1):sn(1)=sq(0)

requires 4 array index operations, 1 string comparison, and 2 array indexed assignments. An optimizing compiler will do even better! It also has a lot lower probability of having a bug in the code.

2) Sorting on concatenated fields of variable length values of variable data types is *guaranteed* to fail.
Jan Karel Pieterse says:

February 19, 2011 at 12:38 pm

Hans: Have you ever done any tests on how much time your sort takes versus a more conventionally coded bubble sort?