Building a self-sorting list

I haven’t posted in a good while. I’ve done over half the Euler problems, but haven’t had the time to research the high-numbered ones considering I might even understand them. Now that I’ve retired from my second career, and my third is only part-time, maybe I’ll get back to it. Today is something perhaps more useful…to have a list of names, specifications, part numbers, etc and have them self-sort, de-duplicate, and collapse without ever having to go to the Sort Menu. Add a member to the bottom, and it flows through. Replace or paste over the data, and Excel turns the crank. Unique values appear, and no VBA is involved. This was a monthly chore in Job #2 for sometimes as many as 8000 email addresses.

Assume in Column A are 1000 (fantasy) names, in random order. These names were generated by using this website four times. Rice University and Dr. Chris Brown have provided name generation tools for many spoken languages. Your Column A might look something like this:

  A B
1 Alfred Sundagger =IF(LEN(A1)>0,COUNTIF(A:A,”<“&A1)+COUNTIF($A$1:$A1,”=”&A1),””)
2 Aki Stonesaber =IF(LEN(A2)>0,COUNTIF(A:A,”<“&A2)+COUNTIF($A$1:$A2,”=”&A2),””)
3 Yrre the Carver
4 Alfred Birchleaf
5 Olaf the Dagger
6 Isen Grimboldson
7 Eagle Arianson
8 Besyrwan Odonson
9 Onund Boarherder
10 Berdoi of the Radiant Face
11 Azhar Fahim
12 Osric the Hostler
13 Yasha the Hare
14 Warian Pikethrower
15 Konrad Firelash
16 Faran Stillstoke

 

The formula in B1 tests if there in anything of length in A1. I’ve found that the LEN() function always returns a value that agrees with my ol’ Mark 1 Mod 0 eyeballs, and empty text doesn’t mislead me. If there is something of length in A1, then COUNTIF() everything in A:A that is less than A1. This uses Excel’s lexicographic algorithms, and that opinion may differ from what you want, but I’ve never had a problem with it. Add to that count using mixed references the COUNTIF() of everything between $A$1 and $A1 inclusive that equals $A1. When we fill down this will become COUNTIF($A$1:$A2,”=”&A2) in A2. Otherwise, if there is nothing of length, place empty text (“”-double double quotes). Fill down well below the extent of values in A:A.

Column B now looks something like this:

  A B C
1 Alfred Sundagger 43 =IF(ROW()<=COUNT(B:B),SMALL(B:B,ROW()),””)
2 Aki Stonesaber 25
3 Yrre the Carver 971
4 Alfred Birchleaf 39
5 Olaf the Dagger 660
6 Isen Grimboldson 470
7 Eagle Arianson 240
8 Besyrwan Odonson 144
9 Onund Boarherder 683
10 Berdoi of the Radiant Face 136
11 Azhar Fahim 110
12 Osric the Hostler 693
13 Yasha the Hare 960
14 Warian Pikethrower 944
15 Konrad Firelash 541
16 Faran Stillstoke 303

 

The formula in C1 tests if the row number is less than or equal to the count of the numbers in Column B. If it is, put the the smallest number there from Column B:B in ROW() order, otherwise put empty text. After filling down as far as in Column B, Column C is just 1 through the extent of your data in Column A. The use of SMALL() is what accommodates blank cells in A:A.

  A B C D
1 Alfred Sundagger 43 1 =IF(ROW()<=COUNT(B:B),INDEX(A:A,MATCH(C1,B:B,0),1),””)
2 Aki Stonesaber 25 2
3 Yrre the Carver 971 3
4 Alfred Birchleaf 39 4
5 Olaf the Dagger 660 5
6 Isen Grimboldson 470 6
7 Eagle Arianson 240 7
8 Besyrwan Odonson 144 8
9 Onund Boarherder 683 9
10 Berdoi of the Radiant Face 136 10
11 Azhar Fahim 110 11
12 Osric the Hostler 693 12
13 Yasha the Hare 960 13
14 Warian Pikethrower 944 14
15 Konrad Firelash 541 15
16 Faran Stillstoke 303 16

 

The formula in D1 tests the ROW() against the COUNT() again, and if ROW() is less than or equal to the COUNT(), specify zero as the third argument and find the exact MATCH() of the number in Column C:C within Column B:B, and then INDEX down Column A:A that far and return that result. Otherwise, put empty text. Fill down as far as in Columns B and C. After filling down, the list is sorted, and might look something like this:

  A B C D
1 Alfred Sundagger 43 1 Aart Moonhammer
2 Aki Stonesaber 25 2 Aart Moonhammer
3 Yrre the Carver 971 3 Aart the Millwright
4 Alfred Birchleaf 39 4 Aart Whiteson
5 Olaf the Dagger 660 5 Aberri of the White Heart
6 Isen Grimboldson 470 6 Aberri the Thieving Wizard
7 Eagle Arianson 240 7 Acennan Badgerrunner
8 Besyrwan Odonson 144 8 Acennan Kenricsson
9 Onund Boarherder 683 9 Adei of the Red Ruins
10 Berdoi of the Radiant Face 136 10 Adolphus Blackbird
11 Azhar Fahim 110 11 Adolphus Blackbird
12 Osric the Hostler 693 12 Adolphus Stillearth
13 Yasha the Hare 960 13 Adrik Yakovovich
14 Warian Pikethrower 944 14 Aethelred Awierganson
15 Konrad Firelash 541 15 Aethelred Darkseed
16 Faran Stillstoke 303 16 Ageio the Fastidious Sorceror

 

Next post we’ll de-dupe the list and then collapse it to unique entries.

…mrt

18 Comments

  1. fzz says:

    There are a lot of ways to do this, and A LOT of those ways could be found in the newsgroup archives. Given the redundant calculations involved in this, udfs could be faster than formulas.

    A more efficient pure formula approach would be to use an array formula in column B, and definitely not use entire column references. If the names in col A spanned rows 1 to 2000, select B1:B2000 and enter the array formula

    =IF(A1:A2000?”,COUNTIF(A1:A2000,”<“&A1:A2000))

    Then you only need one more column.

    C1:
    =INDEX(A$1:A$2000,MATCH(0,$B$1:$B$100,0))

    C2:
    =IF(ISNUMBER(MATCH(ROWS(C$1:C1),B$1:B$2000,0)),INDEX(A$1:A$2000,MATCH(ROWS(C$1:C1),B$1:B$2000,0)),
    IF(ROWS(C$1:C1)”&C1),INDEX(A$1:A$2000,MATCH(COUNTIF(A$1:A$2000,”<=”&C1),$B$1:$B$2000,0)),””)

  2. Michael says:

    Hi fzz -

    There are pluses and minuses with array formulas, and one of the minuses is that they are very hard, relatively, to edit. Easier to fill down.

    Your indexing number in Column B will only point to the first of a kind, and then the match will repeatedly return that first found. In the example, this meets the intent, but if the adjacent cells are unique and need to be carried with the sort, you need a unique id. That’s the purpose of the COUNTIF($A$1:$A1,”=”&A1) part. Trivial example: Column A is an ordered sort of last names. Column B is first names. One name is Doe|John and another is Smith|John and the desired output is a sort on first names because that’s how they show up in the address book (from real life!). Without the unique id, you’ll get John|Doe twice.

    Excel gives me lots of columns. I like to use them to reveal intermediate steps. Tends to keep the hair in my head. I can noodle out your C2 but it’ll cost me some gray hair ;-)

    …mrt

  3. fzz says:

    So why use formulas to sort? Wouldn’t Change and/or Calculate event handlers calling the Sort method of specified range objects be far more efficient? Don’t do so in order to avoid macro security?

    If you must have a unique key including duplicates, then with just formulas,

    B1:B2000:
    =IF(A1:A2000?”,COUNTIF(A1:A2000,”B#,B$1:B$2000)) to find the next smallest after the value in cell B#, and that requires single cell array formulas.

  4. fzz says:

    So why use formulas to sort? Wouldn’t Change and/or Calculate event handlers calling the Sort method of specified range objects be far more efficient? Don’t do so in order to avoid macro security?

    If you must have a unique key including duplicates, then with just formulas,

    B1:B2000:
    =IF(A1:A2000<>“”,COUNTIF(A1:A2000,“<“&A1:A2000)+(ROW(B1:B2000)-ROW(B$1))/ROWS(B1:B2000))

    This is an array formula. To me no more difficult to edit than nonarray formulas. It just takes pressing 3 keys rather than one to enter is. The point here is to avoid calling COUNTIF more than once for each cell in B1:B2000.

    This requires changing the col C formulas as well, to use MIN(IF(B$1:B$2000>B#,B$1:B$2000)) to find the next smallest after the value in cell B#, and that requires single cell array formulas.

  5. Michael says:

    fzz – on the strong probability that I don’t know something here: to expand the array that an array formula applies to, I select the array, delete the formula, select the new array and re-enter the new formula. Anything else I’ve ever tried gives me a “Can’t edit parts of an array” complaint.

    How do you do it? I’m talking about more that a CSE press.

    …mrt

  6. sam says:

    Michael,

    To overcome the problem of CSE you can use Names

    Define Rand_Name =Sheet1!$A$1:INDEX(Sheet1!$A:$A,COUNTA(Sheet1!$A:$A))
    Define Rank_Lst = COUNTIF(Rand_Lst,”<=”&Rand_Lst)

    In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0)

    No need of CSE

  7. sam says:

    Opps forgot a close bracket

    In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0))

  8. Michael says:

    Hi Sam -

    Pressing CSE isn’t the problem…but maybe my approach is ;-) Thanks.

    Both the definition of Rank_Lst and B1 have a Rand_Lst inside. Believe you meant Rand_Name, or you meant Rand_Name to be Rand_Lst?

    …mrt

  9. sam says:

    Michael….yes more typos

    Define Rand_Lst =Sheet1!$A$1:INDEX(Sheet1!$A:$A,COUNTA(Sheet1!$A:$A))
    Define Rank_Lst = COUNTIF(Rand_Lst,”<=”&Rand_Lst)

    In B1 type = Index(Rand_Lst,Match(Row(),Rank_Lst,0))

  10. To change the size of the array, just select all cells you want the array formula to span and hit F2, CSE.

  11. Mario says:

    Jan

    In my experience you can only _increase_ the size of an array that way. I’ve always had to do what Michael says to shrink an array. What is the cool way to shrink an array?

    Nice post by the way – this is something I have to do a lot of. I have my own hacky way of sorting but I’ve never had to deal with missing spaces or duplicates before. I’ll mark this page!

  12. fzz says:

    To increase the size of an array formula’s result, select the larger result range, make one of the cells containing the array active and edit it, then enter as an array formula.

    To shrink the size of an array formula’s result, I find it easiest to select a cell in the array, insert an apostrophe before the initial =, enter as an array formula. That converts all cells in the array to the same string constant, and the result range ceases to be an array. Select the smaller result range, re-edit the formula, delete the initial apostrophe, enter as an array formula. Finally, clear the contents of the cells which were part of the original array result but not the new, smaller result. If this needs to be done often, it’s a good candidate for a macro.

    Sub foo()
      Dim na As Range, oa As Range, af As String

      If Not TypeOf Selection Is Range Then Exit Sub

      Set na = Selection
      If na.Areas.Count > 1 Then Exit Sub

      On Error Resume Next
      Set oa = ActiveCell.CurrentArray
      If oa Is Nothing Then Exit Sub
      On Error GoTo 0

      af = ActiveCell.FormulaR1C1
      oa.ClearContents
      na.FormulaArray = af

    End Sub

    This macro can shrink or expand the array result range. To use it, you need to select the new result range and ensure that the active cell contains the array formula. Error checking left as an exercise.

    Moral: don’t try to do everything manually/interactively. There are many situations in which macros will be vastly more efficient than any human.

  13. Mario says:

    Very nice fzz, thank you – that will be definitely be added to my toolbox.

  14. Instead of changing the formula to a string, you can also edit the array, hit Control+enter instead of CSE to make them all “normal” formulas.

  15. fzz says:

    The [Ctrl]+[Enter] approach has one potential drawback: the resulting formulas differ from cell to cell if there were any partly or fully relative cell references. For example, if B1:B10 contained =2^(A1:A10-1), B1:B10 were selected with B1 the active cell, [F2] [Ctrl]+[Enter] produces =2^(A1:A10-1) in A1, =2^(A2:A11-1) in A2, =2^(A3:A12-1) in A3, etc. So as long as the 1st (leftmost and topmost) cell of the new array result range is the active cell when using [F2] [Ctrl]+[Enter], all’s well, but any other cell active can screw things up. Converting all cells to the same string has the advantage that it doesn’t matter which cell was active. It’s a question of which approach is more robust.

  16. sam says:

    But why use array formulas.
    If you let use “array” names the we can use them in formulas and dont have to array enter the formula

  17. fzz says:

    @sam – if you’re going to push . . . the original article’s IF(LEN(A#)>0, test implies there could be many blank cells interspersed within the data in col A. If so, a fair amount of special processing is needed to exclude cells evaluating to “”, and some of that special processing defeats using defined names as replacements for array expressions.

Posting code or formulas in your comment? Use <code> tags!

  • <code lang="vb">Block of code goes here</code>
  • <code lang="vb" inline="true">Inline code goes here</code>
  • <code>Formula goes here</code>

Leave a Reply

Here's how to update your reports of company and nearly any web data: