Building a self-sorting list (Part 2)

Posted on September 6, 2010 by Michael

In Part 1 we ended up with Column D, a sorted list. One criticism was the many times we were counting the numbers in a column. We should improve it and only count once. Via Insert/Name/Define define Count_BB to =COUNT(Sheet1!$B:$B), and then select Columns C:D, and “Replace All” COUNT(B:B) with Count_BB. Column D should look something like this:

	D	E
1	Aart Moonhammer	=D1
2	Aart Moonhammer	=IF(D2<>D1,D2,””)
3	Aart the Millwright
4	Aart Whiteson
5	Aberri of the White Heart
6	Aberri the Thieving Wizard
7	Acennan Badgerrunner
8	Acennan Kenricsson
9	Adei of the Red Ruins
10	Adolphus Blackbird
11	Adolphus Blackbird
12	Adolphus Stillearth
13	Adrik Yakovovich
14	Aethelred Awierganson
15	Aethelred Darkseed
16	Ageio the Fastidious Sorceror

In Cell E1 we simply move D1 over. In E2 we check to see if D2 is not equal to the cell above it, or in other words D2 starts a new run of names. If it is equal (FALSE condition) we put empty text. Otherwise, we put D2. Fill down from E2 as far down as columns B, C, and D are filled. Column E now looks like this. We have removed the duplicates.

	D	E	F
1	Aart Moonhammer	Aart Moonhammer	=IF(LEN(E1)>0,ROW(),””)
2	Aart Moonhammer
3	Aart the Millwright	Aart the Millwright
4	Aart Whiteson	Aart Whiteson
5	Aberri of the White Heart	Aberri of the White Heart
6	Aberri the Thieving Wizard	Aberri the Thieving Wizard
7	Acennan Badgerrunner	Acennan Badgerrunner
8	Acennan Kenricsson	Acennan Kenricsson
9	Adei of the Red Ruins	Adei of the Red Ruins
10	Adolphus Blackbird	Adolphus Blackbird
11	Adolphus Blackbird
12	Adolphus Stillearth	Adolphus Stillearth
13	Adrik Yakovovich	Adrik Yakovovich
14	Aethelred Awierganson	Aethelred Awierganson
15	Aethelred Darkseed	Aethelred Darkseed
16	Ageio the Fastidious Sorceror	Ageio the Fastidious Sorceror

In F1 we check to see if there is anything of length in E1. If there is, put the row number, otherwise put empty text. As above, define Count_FF as =COUNT(Sheet1!$F$F). After filling down, Column F looks like this:

	D	E	F	G
1	Aart Moonhammer	Aart Moonhammer	1	=IF(ROW()<=Count_FF,SMALL(F:F,ROW()),””)
2	Aart Moonhammer
3	Aart the Millwright	Aart the Millwright	3
4	Aart Whiteson	Aart Whiteson	4
5	Aberri of the White Heart	Aberri of the White Heart	5
6	Aberri the Thieving Wizard	Aberri the Thieving Wizard	6
7	Acennan Badgerrunner	Acennan Badgerrunner	7
8	Acennan Kenricsson	Acennan Kenricsson	8
9	Adei of the Red Ruins	Adei of the Red Ruins	9
10	Adolphus Blackbird	Adolphus Blackbird	10
11	Adolphus Blackbird
12	Adolphus Stillearth	Adolphus Stillearth	12
13	Adrik Yakovovich	Adrik Yakovovich	13
14	Aethelred Awierganson	Aethelred Awierganson	14
15	Aethelred Darkseed	Aethelred Darkseed	15
16	Ageio the Fastidious Sorceror	Ageio the Fastidious Sorceror	16

In G1 we again test the row number, but this time it’s against the count of numbers in F:F. If the ROW() is less than or equal to Count_FF, put the numbers from F:F there is ROW() order. Fill down as before. Column G looks like this:

	D	E	F	G	H
1	Aart Moonhammer	Aart Moonhammer	1	1	=IF(ROW()<=Count_FF,INDEX(E:E,G1,1),””)
2	Aart Moonhammer			3
3	Aart the Millwright	Aart the Millwright	3	4
4	Aart Whiteson	Aart Whiteson	4	5
5	Aberri of the White Heart	Aberri of the White Heart	5	6
6	Aberri the Thieving Wizard	Aberri the Thieving Wizard	6	7
7	Acennan Badgerrunner	Acennan Badgerrunner	7	8
8	Acennan Kenricsson	Acennan Kenricsson	8	9
9	Adei of the Red Ruins	Adei of the Red Ruins	9	10
10	Adolphus Blackbird	Adolphus Blackbird	10	12
11	Adolphus Blackbird			13
12	Adolphus Stillearth	Adolphus Stillearth	12	14
13	Adrik Yakovovich	Adrik Yakovovich	13	15
14	Aethelred Awierganson	Aethelred Awierganson	14	16
15	Aethelred Darkseed	Aethelred Darkseed	15	18
16	Ageio the Fastidious Sorceror	Ageio the Fastidious Sorceror	16	19

Last Step. In H1 we again compare the row number to the count of numbers in F:F. If ROW() is less than or equal to the Count_FF, then index E:E (could also be D:D) the number of rows shown in G:G. Fill down as before. Column H looks like this:

	D	E	F	G	H
1	Aart Moonhammer	Aart Moonhammer	1	1	Aart Moonhammer
2	Aart Moonhammer			3	Aart the Millwright
3	Aart the Millwright	Aart the Millwright	3	4	Aart Whiteson
4	Aart Whiteson	Aart Whiteson	4	5	Aberri of the White Heart
5	Aberri of the White Heart	Aberri of the White Heart	5	6	Aberri the Thieving Wizard
6	Aberri the Thieving Wizard	Aberri the Thieving Wizard	6	7	Acennan Badgerrunner
7	Acennan Badgerrunner	Acennan Badgerrunner	7	8	Acennan Kenricsson
8	Acennan Kenricsson	Acennan Kenricsson	8	9	Adei of the Red Ruins
9	Adei of the Red Ruins	Adei of the Red Ruins	9	10	Adolphus Blackbird
10	Adolphus Blackbird	Adolphus Blackbird	10	12	Adolphus Stillearth
11	Adolphus Blackbird			13	Adrik Yakovovich
12	Adolphus Stillearth	Adolphus Stillearth	12	14	Aethelred Awierganson
13	Adrik Yakovovich	Adrik Yakovovich	13	15	Aethelred Darkseed
14	Aethelred Awierganson	Aethelred Awierganson	14	16	Ageio the Fastidious Sorceror
15	Aethelred Darkseed	Aethelred Darkseed	15	18	Agoztar of the Ghost Face
16	Ageio the Fastidious Sorceror	Ageio the Fastidious Sorceror	16	19	Aide of the Dead Woods

The list that started in A:A is now sorted, de-duplicated, and collapsed to unique values, ready in H:H for whatever you might need as you write your fantasy novel. All you need to remember is to fill B2:Hn down well beyond any possible extent of A:A. This can be extended to handle several field records by indexing the appropriate columns at the two appropriate points.

…mrt

You can download self_sorting_names.zip

15 thoughts on “Building a self-sorting list (Part 2)”

Alan Hutchins says:

September 7, 2010 at 1:25 am

Michael,

A very nice solution.

Can you post the final workbook so that we can see the final formulae?

This is exactly what I need to sort out a big list on an ongoin basis, and will prove very helpful.
Ivo says:

September 7, 2010 at 1:27 am

Cool tricks to de-duplicated and collapse to unique values.

I have one question. In column E you only check for one duplicate value. I currently have a dataset where a name occurs on more than two occasions. Sometimes three up to six times.

Do you have a tip to check for unique names which occur n-times in a dataset?
Jan Karel Pieterse says:

September 7, 2010 at 5:04 am

I’m not sure if this is correct, but I seem to recall that a name is evaluated every time it is being called.
If so, using the name count_BB does not mean you only do the count once.

For example: If I put a call to a UDF in a name, each call to that name triggers a call to the UDF.
Michael says:

September 7, 2010 at 7:01 am

Alan – Thank you. I’ll forward the spreadsheet to Dick with a request that he add it to this thread. He rightly takes serious regard over what gets posted here. The formulas used were copied right out of the applicable cells into the tables. I tried to make sure no curly quotes got substituted for straight ones. You should be able to copy them directly.

Jan Karel – Ooo. Didn’t know that. Well, then, it’s a fine example of a named formula instead of a named range. :roll:

…mrt
Michael says:

September 7, 2010 at 7:58 am

Hi Ivo –

The example dataset doesn’t show it (the spreadsheet will) but once the names are sorted (in Part 1) they all (1 thru many) appear grouped together in Column D. The formula in E2 will see that the 2nd is like the first and put empty text, and then the third is like the second and put empty text, and the fourth is like the third, etc. until a name comes along not like the one above it in Column D. This gets moved over, and the second gets compared etc.

That little formula in E2, filled down, does all you need.

…mrt
fzz says:

September 7, 2010 at 8:34 am

If you use COUNTIF(Range,Range), the approach is necessarily no better than O(N^2). The Sort method of the range class uses QuickSort, which is O(N log(N)). If efficiency is a goal, using event macros would be more efficient. While it may be possible the whittle away at the inefficiency of formula approaches, they’ll never come close to the efficiency of macro approaches.
Nick says:

September 8, 2010 at 6:54 am

To be honest, the simplest way would be to use a pivot table:

1) insert a header row at the top of the list called “Name”
2) create a pivot table off that plus any spare cells beloew the range you want for additional names at a later stage
3) in the pivot table, drop in name as the row. This will automatically de-dupe and sort the list by name, really really quickly
4) if you add more names to the list, just refresh the pivot table

As an added bonus, it’ll even show you the number of occurrences of each name if you then drag the “Name” field in to the data items area as it’ll default to counting them if they are text fields.
fzz says:

September 8, 2010 at 9:18 am

Pivot tables aren’t automatic. Pivot tables don’t automatically change when source data changes. If automation is a key requirement, pivot tables aren’t the answer.
Nick says:

September 8, 2010 at 10:20 am

fzz – you stated in your previous post that the Sort method and VBA automation was the answer, and I agree that a Sort plus an Advanced Filter on unique records would do the necessary work. Surely, however, using VBA to automate the refresh of a PivotCache when the range changes is an equally valid approach in that case? The additional benefits are that you get the sorting and duplication bundled together rather than a 2 step approach, and also the ability to show the count of the occurrences if you require that (and even sort by that rather than by name etc.)
Nick says:

September 8, 2010 at 10:22 am

To be clear – you’d only ever need to do steps 1-3 of the original approach once. Once the table is built you can call a Pivotcaches.Refresh from the Worksheets_Change method which would then automatically refresh the list as originally required.
fzz says:

September 8, 2010 at 1:48 pm

If all you want to do is sort a range, pivot tables are overkill. If you want to sort and filter, then pivot tables have their uses.
AlexJ says:

September 9, 2010 at 1:30 pm

I vote for Nick – Pivot table with event based update of the pivot cache is not overkill – it’s ‘just-right’ kill. I use this all the time.
BryanD says:

September 17, 2010 at 8:38 am

Couldn’t you avoid part 2 of this post by including a FALSE in the match formula?

In D1 – =if(row()<=count(B:B),index(A:A,match(C1,B:B,FALSE),1),””)
Michael says:

September 18, 2010 at 7:54 am

Hi Bryan –

The MATCH() third arguments are 1, 0, -1. Never False. 1 would work, I prefer 0.

…mrt
Jonathan D Price says:

October 2, 2012 at 1:00 pm

I know this is very late to the party, but I just found this while looking for a similar solution. I noticed though that when I have many duplicates that some can make their way through to the final list so I made a simple change in E2 to make the formula

=(COUNTIF($E$1:$E1,D2)<1,D2,"")

This way it will count all dups above the current value that are already in the column and exclude all but the first occurrence.