Building a self-sorting list (Part 2)

In Part 1 we ended up with Column D, a sorted list. One criticism was the many times we were counting the numbers in a column. We should improve it and only count once. Via Insert/Name/Define define Count_BB to =COUNT(Sheet1!\$B:\$B), and then select Columns C:D, and “Replace All” COUNT(B:B) with Count_BB. Column D should look something like this:

D E
1 Aart Moonhammer =D1
2 Aart Moonhammer =IF(D2<>D1,D2,””)
3 Aart the Millwright
4 Aart Whiteson
5 Aberri of the White Heart
6 Aberri the Thieving Wizard
8 Acennan Kenricsson
9 Adei of the Red Ruins
14 Aethelred Awierganson
15 Aethelred Darkseed
16 Ageio the Fastidious Sorceror

In Cell E1 we simply move D1 over. In E2 we check to see if D2 is not equal to the cell above it, or in other words D2 starts a new run of names. If it is equal (FALSE condition) we put empty text. Otherwise, we put D2. Fill down from E2 as far down as columns B, C, and D are filled. Column E now looks like this. We have removed the duplicates.

D E F
1 Aart Moonhammer Aart Moonhammer =IF(LEN(E1)>0,ROW(),””)
2 Aart Moonhammer
3 Aart the Millwright Aart the Millwright
4 Aart Whiteson Aart Whiteson
5 Aberri of the White Heart Aberri of the White Heart
6 Aberri the Thieving Wizard Aberri the Thieving Wizard
8 Acennan Kenricsson Acennan Kenricsson
14 Aethelred Awierganson Aethelred Awierganson
15 Aethelred Darkseed Aethelred Darkseed
16 Ageio the Fastidious Sorceror Ageio the Fastidious Sorceror

In F1 we check to see if there is anything of length in E1. If there is, put the row number, otherwise put empty text. As above, define Count_FF as =COUNT(Sheet1!\$F\$F). After filling down, Column F looks like this:

D E F G
1 Aart Moonhammer Aart Moonhammer 1 =IF(ROW()<=Count_FF,SMALL(F:F,ROW()),””)
2 Aart Moonhammer
3 Aart the Millwright Aart the Millwright 3
4 Aart Whiteson Aart Whiteson 4
5 Aberri of the White Heart Aberri of the White Heart 5
6 Aberri the Thieving Wizard Aberri the Thieving Wizard 6
8 Acennan Kenricsson Acennan Kenricsson 8
9 Adei of the Red Ruins Adei of the Red Ruins 9
14 Aethelred Awierganson Aethelred Awierganson 14
15 Aethelred Darkseed Aethelred Darkseed 15
16 Ageio the Fastidious Sorceror Ageio the Fastidious Sorceror 16

In G1 we again test the row number, but this time it’s against the count of numbers in F:F. If the ROW() is less than or equal to Count_FF, put the numbers from F:F there is ROW() order. Fill down as before. Column G looks like this:

D E F G H
1 Aart Moonhammer Aart Moonhammer 1 1 =IF(ROW()<=Count_FF,INDEX(E:E,G1,1),””)
2 Aart Moonhammer   3
3 Aart the Millwright Aart the Millwright 3 4
4 Aart Whiteson Aart Whiteson 4 5
5 Aberri of the White Heart Aberri of the White Heart 5 6
6 Aberri the Thieving Wizard Aberri the Thieving Wizard 6 7
8 Acennan Kenricsson Acennan Kenricsson 8 9
9 Adei of the Red Ruins Adei of the Red Ruins 9 10
14 Aethelred Awierganson Aethelred Awierganson 14 16
15 Aethelred Darkseed Aethelred Darkseed 15 18
16 Ageio the Fastidious Sorceror Ageio the Fastidious Sorceror 16 19

Last Step. In H1 we again compare the row number to the count of numbers in F:F. If ROW() is less than or equal to the Count_FF, then index E:E (could also be D:D) the number of rows shown in G:G. Fill down as before. Column H looks like this:

D E F G H
1 Aart Moonhammer Aart Moonhammer 1 1 Aart Moonhammer
2 Aart Moonhammer   3 Aart the Millwright
3 Aart the Millwright Aart the Millwright 3 4 Aart Whiteson
4 Aart Whiteson Aart Whiteson 4 5 Aberri of the White Heart
5 Aberri of the White Heart Aberri of the White Heart 5 6 Aberri the Thieving Wizard
6 Aberri the Thieving Wizard Aberri the Thieving Wizard 6 7 Acennan Badgerrunner
8 Acennan Kenricsson Acennan Kenricsson 8 9 Adei of the Red Ruins
14 Aethelred Awierganson Aethelred Awierganson 14 16 Ageio the Fastidious Sorceror
15 Aethelred Darkseed Aethelred Darkseed 15 18 Agoztar of the Ghost Face
16 Ageio the Fastidious Sorceror Ageio the Fastidious Sorceror 16 19 Aide of the Dead Woods

The list that started in A:A is now sorted, de-duplicated, and collapsed to unique values, ready in H:H for whatever you might need as you write your fantasy novel. All you need to remember is to fill B2:Hn down well beyond any possible extent of A:A. This can be extended to handle several field records by indexing the appropriate columns at the two appropriate points.

…mrt

Posted in Uncategorized

15 thoughts on “Building a self-sorting list (Part 2)”

1. Alan Hutchins says:

Michael,

A very nice solution.

Can you post the final workbook so that we can see the final formulae?

This is exactly what I need to sort out a big list on an ongoin basis, and will prove very helpful.

2. Ivo says:

Cool tricks to de-duplicated and collapse to unique values.

I have one question. In column E you only check for one duplicate value. I currently have a dataset where a name occurs on more than two occasions. Sometimes three up to six times.

Do you have a tip to check for unique names which occur n-times in a dataset?

3. I’m not sure if this is correct, but I seem to recall that a name is evaluated every time it is being called.
If so, using the name count_BB does not mean you only do the count once.

For example: If I put a call to a UDF in a name, each call to that name triggers a call to the UDF.

4. Michael says:

Alan – Thank you. I’ll forward the spreadsheet to Dick with a request that he add it to this thread. He rightly takes serious regard over what gets posted here. The formulas used were copied right out of the applicable cells into the tables. I tried to make sure no curly quotes got substituted for straight ones. You should be able to copy them directly.

Jan Karel – Ooo. Didn’t know that. Well, then, it’s a fine example of a named formula instead of a named range. :roll:

…mrt

5. Michael says:

Hi Ivo –

The example dataset doesn’t show it (the spreadsheet will) but once the names are sorted (in Part 1) they all (1 thru many) appear grouped together in Column D. The formula in E2 will see that the 2nd is like the first and put empty text, and then the third is like the second and put empty text, and the fourth is like the third, etc. until a name comes along not like the one above it in Column D. This gets moved over, and the second gets compared etc.

That little formula in E2, filled down, does all you need.

…mrt

6. fzz says:

If you use COUNTIF(Range,Range), the approach is necessarily no better than O(N^2). The Sort method of the range class uses QuickSort, which is O(N log(N)). If efficiency is a goal, using event macros would be more efficient. While it may be possible the whittle away at the inefficiency of formula approaches, they’ll never come close to the efficiency of macro approaches.

7. Nick says:

To be honest, the simplest way would be to use a pivot table:

1) insert a header row at the top of the list called “Name”
2) create a pivot table off that plus any spare cells beloew the range you want for additional names at a later stage
3) in the pivot table, drop in name as the row. This will automatically de-dupe and sort the list by name, really really quickly
4) if you add more names to the list, just refresh the pivot table

As an added bonus, it’ll even show you the number of occurrences of each name if you then drag the “Name” field in to the data items area as it’ll default to counting them if they are text fields.

8. fzz says:

Pivot tables aren’t automatic. Pivot tables don’t automatically change when source data changes. If automation is a key requirement, pivot tables aren’t the answer.

9. Nick says:

fzz – you stated in your previous post that the Sort method and VBA automation was the answer, and I agree that a Sort plus an Advanced Filter on unique records would do the necessary work. Surely, however, using VBA to automate the refresh of a PivotCache when the range changes is an equally valid approach in that case? The additional benefits are that you get the sorting and duplication bundled together rather than a 2 step approach, and also the ability to show the count of the occurrences if you require that (and even sort by that rather than by name etc.)

10. Nick says:

To be clear – you’d only ever need to do steps 1-3 of the original approach once. Once the table is built you can call a Pivotcaches.Refresh from the Worksheets_Change method which would then automatically refresh the list as originally required.

11. fzz says:

If all you want to do is sort a range, pivot tables are overkill. If you want to sort and filter, then pivot tables have their uses.

12. AlexJ says:

I vote for Nick – Pivot table with event based update of the pivot cache is not overkill – it’s ‘just-right’ kill. I use this all the time.

13. BryanD says:

Couldn’t you avoid part 2 of this post by including a FALSE in the match formula?

In D1 – =if(row()<=count(B:B),index(A:A,match(C1,B:B,FALSE),1),””)

14. Michael says:

Hi Bryan –

The MATCH() third arguments are 1, 0, -1. Never False. 1 would work, I prefer 0.

…mrt

15. Jonathan D Price says:

I know this is very late to the party, but I just found this while looking for a similar solution. I noticed though that when I have many duplicates that some can make their way through to the final list so I made a simple change in E2 to make the formula

=(COUNTIF(\$E\$1:\$E1,D2)<1,D2,"")

This way it will count all dups above the current value that are already in the column and exclude all but the first occurrence.

Posting code? Use <pre> tags for VBA and <code> tags for inline.