Building a self-sorting list

Posted on September 3, 2010 by Michael

I haven’t posted in a good while. I’ve done over half the Euler problems, but haven’t had the time to research the high-numbered ones considering I might even understand them. Now that I’ve retired from my second career, and my third is only part-time, maybe I’ll get back to it. Today is something perhaps more useful…to have a list of names, specifications, part numbers, etc and have them self-sort, de-duplicate, and collapse without ever having to go to the Sort Menu. Add a member to the bottom, and it flows through. Replace or paste over the data, and Excel turns the crank. Unique values appear, and no VBA is involved. This was a monthly chore in Job #2 for sometimes as many as 8000 email addresses.

Assume in Column A are 1000 (fantasy) names, in random order. These names were generated by using this website four times. Rice University and Dr. Chris Brown have provided name generation tools for many spoken languages. Your Column A might look something like this:

	A	B
1	Alfred Sundagger	=IF(LEN(A1)>0,COUNTIF(A:A,”<“&A1)+COUNTIF($A$1:$A1,”=”&A1),””)
2	Aki Stonesaber	=IF(LEN(A2)>0,COUNTIF(A:A,”<“&A2)+COUNTIF($A$1:$A2,”=”&A2),””)
3	Yrre the Carver
4	Alfred Birchleaf
5	Olaf the Dagger
6	Isen Grimboldson
7	Eagle Arianson
8	Besyrwan Odonson
9	Onund Boarherder
10	Berdoi of the Radiant Face
11	Azhar Fahim
12	Osric the Hostler
13	Yasha the Hare
14	Warian Pikethrower
15	Konrad Firelash
16	Faran Stillstoke

The formula in B1 tests if there in anything of length in A1. I’ve found that the LEN() function always returns a value that agrees with my ol’ Mark 1 Mod 0 eyeballs, and empty text doesn’t mislead me. If there is something of length in A1, then COUNTIF() everything in A:A that is less than A1. This uses Excel’s lexicographic algorithms, and that opinion may differ from what you want, but I’ve never had a problem with it. Add to that count using mixed references the COUNTIF() of everything between $A$1 and $A1 inclusive that equals $A1. When we fill down this will become COUNTIF($A$1:$A2,”=”&A2) in A2. Otherwise, if there is nothing of length, place empty text (“”-double double quotes). Fill down well below the extent of values in A:A.

Column B now looks something like this:

	A	B	C
1	Alfred Sundagger	43	=IF(ROW()<=COUNT(B:B),SMALL(B:B,ROW()),””)
2	Aki Stonesaber	25
3	Yrre the Carver	971
4	Alfred Birchleaf	39
5	Olaf the Dagger	660
6	Isen Grimboldson	470
7	Eagle Arianson	240
8	Besyrwan Odonson	144
9	Onund Boarherder	683
10	Berdoi of the Radiant Face	136
11	Azhar Fahim	110
12	Osric the Hostler	693
13	Yasha the Hare	960
14	Warian Pikethrower	944
15	Konrad Firelash	541
16	Faran Stillstoke	303

The formula in C1 tests if the row number is less than or equal to the count of the numbers in Column B. If it is, put the the smallest number there from Column B:B in ROW() order, otherwise put empty text. After filling down as far as in Column B, Column C is just 1 through the extent of your data in Column A. The use of SMALL() is what accommodates blank cells in A:A.

	A	B	C	D
1	Alfred Sundagger	43	1	=IF(ROW()<=COUNT(B:B),INDEX(A:A,MATCH(C1,B:B,0),1),””)
2	Aki Stonesaber	25	2
3	Yrre the Carver	971	3
4	Alfred Birchleaf	39	4
5	Olaf the Dagger	660	5
6	Isen Grimboldson	470	6
7	Eagle Arianson	240	7
8	Besyrwan Odonson	144	8
9	Onund Boarherder	683	9
10	Berdoi of the Radiant Face	136	10
11	Azhar Fahim	110	11
12	Osric the Hostler	693	12
13	Yasha the Hare	960	13
14	Warian Pikethrower	944	14
15	Konrad Firelash	541	15
16	Faran Stillstoke	303	16

The formula in D1 tests the ROW() against the COUNT() again, and if ROW() is less than or equal to the COUNT(), specify zero as the third argument and find the exact MATCH() of the number in Column C:C within Column B:B, and then INDEX down Column A:A that far and return that result. Otherwise, put empty text. Fill down as far as in Columns B and C. After filling down, the list is sorted, and might look something like this:

	A	B	C	D
1	Alfred Sundagger	43	1	Aart Moonhammer
2	Aki Stonesaber	25	2	Aart Moonhammer
3	Yrre the Carver	971	3	Aart the Millwright
4	Alfred Birchleaf	39	4	Aart Whiteson
5	Olaf the Dagger	660	5	Aberri of the White Heart
6	Isen Grimboldson	470	6	Aberri the Thieving Wizard
7	Eagle Arianson	240	7	Acennan Badgerrunner
8	Besyrwan Odonson	144	8	Acennan Kenricsson
9	Onund Boarherder	683	9	Adei of the Red Ruins
10	Berdoi of the Radiant Face	136	10	Adolphus Blackbird
11	Azhar Fahim	110	11	Adolphus Blackbird
12	Osric the Hostler	693	12	Adolphus Stillearth
13	Yasha the Hare	960	13	Adrik Yakovovich
14	Warian Pikethrower	944	14	Aethelred Awierganson
15	Konrad Firelash	541	15	Aethelred Darkseed
16	Faran Stillstoke	303	16	Ageio the Fastidious Sorceror

Next post we’ll de-dupe the list and then collapse it to unique entries.

…mrt

19 thoughts on “Building a self-sorting list”

fzz says:

September 3, 2010 at 8:13 pm

There are a lot of ways to do this, and A LOT of those ways could be found in the newsgroup archives. Given the redundant calculations involved in this, udfs could be faster than formulas.

A more efficient pure formula approach would be to use an array formula in column B, and definitely not use entire column references. If the names in col A spanned rows 1 to 2000, select B1:B2000 and enter the array formula

=IF(A1:A2000?”,COUNTIF(A1:A2000,”<“&A1:A2000))

Then you only need one more column.

C1:
=INDEX(A$1:A$2000,MATCH(0,$B$1:$B$100,0))

C2:
=IF(ISNUMBER(MATCH(ROWS(C$1:C1),B$1:B$2000,0)),INDEX(A$1:A$2000,MATCH(ROWS(C$1:C1),B$1:B$2000,0)),
IF(ROWS(C$1:C1)”&C1),INDEX(A$1:A$2000,MATCH(COUNTIF(A$1:A$2000,”<=”&C1),$B$1:$B$2000,0)),””)
Michael says:

September 4, 2010 at 8:18 am

Hi fzz –

There are pluses and minuses with array formulas, and one of the minuses is that they are very hard, relatively, to edit. Easier to fill down.

Your indexing number in Column B will only point to the first of a kind, and then the match will repeatedly return that first found. In the example, this meets the intent, but if the adjacent cells are unique and need to be carried with the sort, you need a unique id. That’s the purpose of the COUNTIF($A$1:$A1,”=”&A1) part. Trivial example: Column A is an ordered sort of last names. Column B is first names. One name is Doe|John and another is Smith|John and the desired output is a sort on first names because that’s how they show up in the address book (from real life!). Without the unique id, you’ll get John|Doe twice.

Excel gives me lots of columns. I like to use them to reveal intermediate steps. Tends to keep the hair in my head. I can noodle out your C2 but it’ll cost me some gray hair ;-)

…mrt
fzz says:

September 4, 2010 at 3:27 pm

So why use formulas to sort? Wouldn’t Change and/or Calculate event handlers calling the Sort method of specified range objects be far more efficient? Don’t do so in order to avoid macro security?

If you must have a unique key including duplicates, then with just formulas,

B1:B2000:
=IF(A1:A2000?”,COUNTIF(A1:A2000,”B#,B$1:B$2000)) to find the next smallest after the value in cell B#, and that requires single cell array formulas.
fzz says:

September 4, 2010 at 3:28 pm

So why use formulas to sort? Wouldn’t Change and/or Calculate event handlers calling the Sort method of specified range objects be far more efficient? Don’t do so in order to avoid macro security?

If you must have a unique key including duplicates, then with just formulas,

B1:B2000:
=IF(A1:A2000<>“”,COUNTIF(A1:A2000,“<“&A1:A2000)+(ROW(B1:B2000)-ROW(B$1))/ROWS(B1:B2000))

This is an array formula. To me no more difficult to edit than nonarray formulas. It just takes pressing 3 keys rather than one to enter is. The point here is to avoid calling COUNTIF more than once for each cell in B1:B2000.

This requires changing the col C formulas as well, to use MIN(IF(B$1:B$2000>B#,B$1:B$2000)) to find the next smallest after the value in cell B#, and that requires single cell array formulas.
Michael says:

September 4, 2010 at 3:36 pm

fzz – on the strong probability that I don’t know something here: to expand the array that an array formula applies to, I select the array, delete the formula, select the new array and re-enter the new formula. Anything else I’ve ever tried gives me a “Can’t edit parts of an array” complaint.

How do you do it? I’m talking about more that a CSE press.

…mrt
sam says:

September 5, 2010 at 8:38 am

Michael,

To overcome the problem of CSE you can use Names

Define Rand_Name =Sheet1!$A$1:INDEX(Sheet1!$A:$A,COUNTA(Sheet1!$A:$A))
Define Rank_Lst = COUNTIF(Rand_Lst,”<=”&Rand_Lst)

In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0)

No need of CSE
sam says:

September 5, 2010 at 8:39 am

Opps forgot a close bracket

In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0))
Michael says:

September 5, 2010 at 11:54 am

Hi Sam –

Pressing CSE isn’t the problem…but maybe my approach is ;-) Thanks.

Both the definition of Rank_Lst and B1 have a Rand_Lst inside. Believe you meant Rand_Name, or you meant Rand_Name to be Rand_Lst?

…mrt
sam says:

September 5, 2010 at 9:51 pm

Michael….yes more typos

Define Rand_Lst =Sheet1!$A$1:INDEX(Sheet1!$A:$A,COUNTA(Sheet1!$A:$A))
Define Rank_Lst = COUNTIF(Rand_Lst,”<=”&Rand_Lst)

In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0))
Jan Karel Pieterse says:

September 6, 2010 at 12:16 am

To change the size of the array, just select all cells you want the array formula to span and hit F2, CSE.
Mario says:

September 7, 2010 at 6:53 am

Jan

In my experience you can only _increase_ the size of an array that way. I’ve always had to do what Michael says to shrink an array. What is the cool way to shrink an array?

Nice post by the way – this is something I have to do a lot of. I have my own hacky way of sorting but I’ve never had to deal with missing spaces or duplicates before. I’ll mark this page!
fzz says:

September 7, 2010 at 7:57 am

To increase the size of an array formula’s result, select the larger result range, make one of the cells containing the array active and edit it, then enter as an array formula.

To shrink the size of an array formula’s result, I find it easiest to select a cell in the array, insert an apostrophe before the initial =, enter as an array formula. That converts all cells in the array to the same string constant, and the result range ceases to be an array. Select the smaller result range, re-edit the formula, delete the initial apostrophe, enter as an array formula. Finally, clear the contents of the cells which were part of the original array result but not the new, smaller result. If this needs to be done often, it’s a good candidate for a macro.

Sub foo()
Dim na As Range, oa As Range, af As String

If Not TypeOf Selection Is Range Then Exit Sub

Set na = Selection
If na.Areas.Count > 1 Then Exit Sub

On Error Resume Next
Set oa = ActiveCell.CurrentArray
If oa Is Nothing Then Exit Sub
On Error GoTo 0

af = ActiveCell.FormulaR1C1
oa.ClearContents
na.FormulaArray = af

End Sub

This macro can shrink or expand the array result range. To use it, you need to select the new result range and ensure that the active cell contains the array formula. Error checking left as an exercise.

Moral: don’t try to do everything manually/interactively. There are many situations in which macros will be vastly more efficient than any human.
Mario says:

September 7, 2010 at 8:07 am

Very nice fzz, thank you – that will be definitely be added to my toolbox.
Jan Karel Pieterse says:

September 7, 2010 at 8:40 am

Instead of changing the formula to a string, you can also edit the array, hit Control+enter instead of CSE to make them all “normal” formulas.
Dick Kusleika says:

September 7, 2010 at 8:54 am

You can download the sample workbook in Part 2

http://www.dailydoseofexcel.com/archives/2010/09/06/building-a-self-sorting-list-part-2/
fzz says:

September 7, 2010 at 9:10 am

The [Ctrl]+[Enter] approach has one potential drawback: the resulting formulas differ from cell to cell if there were any partly or fully relative cell references. For example, if B1:B10 contained =2^(A1:A10-1), B1:B10 were selected with B1 the active cell, [F2] [Ctrl]+[Enter] produces =2^(A1:A10-1) in A1, =2^(A2:A11-1) in A2, =2^(A3:A12-1) in A3, etc. So as long as the 1st (leftmost and topmost) cell of the new array result range is the active cell when using [F2] [Ctrl]+[Enter], all’s well, but any other cell active can screw things up. Converting all cells to the same string has the advantage that it doesn’t matter which cell was active. It’s a question of which approach is more robust.
sam says:

September 8, 2010 at 6:10 am

But why use array formulas.
If you let use “array” names the we can use them in formulas and dont have to array enter the formula
fzz says:

September 8, 2010 at 9:16 am

@sam – if you’re going to push . . . the original article’s IF(LEN(A#)>0, test implies there could be many blank cells interspersed within the data in col A. If so, a fair amount of special processing is needed to exclude cells evaluating to “”, and some of that special processing defeats using defined names as replacements for array expressions.
Pingback: Dynamic Charting from disparate data