Dynamic Text Sorting

A few days ago I solved a problem that I’ve been working on for more than five years: How to create a multi-cell array formula that returns a sorted list of text entries. It’s easy enough to do this with numeric entries, but the text solution has eluded me.

Range A2:A21 is named data. Select C2:C21 and type (or paste) the formula below. Then press Ctrl+Shift+Enter (rather than Enter).

=INDEX(data,MATCH(ROW(INDIRECT(“1:”&ROWS(data)))-1,COUNTIF(data,”< ” &data),0))

Text entries made in the data range appear in the C2:C21 range in alphabetical order.

Caveat: It doesn’t work for numeric entries, or duplicated text entries.

If you’d like to get rid of the #N/A display, wrap it up in an IF function:

=IF(ISNA(INDEX(data,MATCH(ROW(INDIRECT(“1:”&ROWS(data)))-1,COUNTIF(data,”< ” &data),0))),””,INDEX(data,MATCH(ROW(INDIRECT(“1:”&ROWS(data)))-1,COUNTIF(data,”<” &data),0)))

Here’s how to do it using the new IFERROR function (and this may be the best reason yet to upgrade to Excel 2007):

=IFERROR(INDEX(data,MATCH(ROW(INDIRECT(“1:”&ROWS(data)))-1,COUNTIF(data,”< ” &data),0)),””)

Why did it take me so long to figure this out? I was fixated on an entirely different approach — one that I’m convinced would never work. I hadn’t thought about it for about a year, but then all of a sudden it came to me.

I can’t think of any good reason to actually use this, but I feel pretty good about figuring it out.

Posted in Uncategorized

60 thoughts on “Dynamic Text Sorting

  1. Too bad your array formula can’t correct the spelling of “javelina”.

    Seriously, this is a great achievement, and one of which you can be justly proud. As soon as you can get it to work with numeric entries and duplicated text entries, that is. Have it on my desk in the morning. Thanks.

  2. Ouch. It’s hard to look at someones prepacked solution to something you’ve worked on for so long. But there aren’t really any shortcuts. Mr Google and his cut-and-paste confections can’t replace the experience gained through 5 years of determined effort. Kudos to John (as though he needs any more).

  3. Maybe you can’t think of any use for it, but I can!!

    In my work I WILL use this quite often — Thank you very much.

  4. actually I might have some uses for this, it is a perfect way to feed a sorted combobox or validation list (I dont think you can autosort validation lists)

    so it might be more usefull than you think (in my book, everything to prevent using a vba macro for small details is better, because it means that the end user will probably not disable anything and mess with the end result)

  5. after a quick test, it works indeed like a charm to sort validation lists, I have some worksheets were user always have to add vendors and stuff like that and it will help a lot.
    (actually MS should make a feature of auto-sorting validation lists but it is another point)

  6. Good stuff. I had not read about the IFERROR in 2007. And right when I finally made myself a button to add that automatically.

  7. Very nice!

    Only tenously related – I wrote this to avoid wrapping IF(ISERROR( )) around everything (I won’t see Excel 2007 in my workplace for 10 years). However, I’m probably not recognizing a fault with this. What’s bad about this UDF, please?

    Function ErrTrap(Returned, Default)
    ‘Function to trap returned errors and substitute Default
    If WorksheetFunction.IsError(Returned) Then
    ErrTrap = Default
    ErrTrap = Returned
    End If
    End Function


  8. Brett, the only potential problem with that function is if the user disables macros. And, it would probably increase recalc time significantly if you used it a lot. VBA functions are MUCH slower than built-in functions.

  9. Nice formula – guess it could be useful for the source of a self sorting chart perhaps?
    Also replacing ‘INDEX(‘ by ‘HLOOKUP(“žž”,’ should return an array usable in a named formula (assuming no duplicates).

  10. Brett,

    I would add that UDFs in general are just too glitchy to bother if you need to share the workbook. That’s just my personal opinion based on too many bad experiences, and I will use a UDF when there is no other way, but usually there is. My approach to the problem is a toolbar macro to put an IF-ISERROR wrapper on for me. IFERROR sounds neat, though I’d much rather just have a workbook level option to just suppress errors in the same way you can suppress zeros. Most of the time the error is not an issue, it’s just a matter of not wanting the ugly thing on your final printout.

    Sub IsErrorTheFormula()
    Dim rngCell As Range
    Dim intState As Integer

        On Error Resume Next
        Application.ScreenUpdating = False
        intState = Application.Calculation
        Application.Calculation = xlCalculationManual

        For Each rngCell In Selection
            If Left(rngCell.Formula, 11)  “=IF(ISERROR” Then
               rngCell.Formula = “=if(iserror(“ &amp; Mid(rngCell.Formula, 2, 10000) _
                  &amp; “),0,” &amp; Mid(rngCell.Formula, 2, 10000) &amp; “)”
            End If

        Application.Calculation = intState
        Application.ScreenUpdating = True
    End Sub

  11. John,

    Thanks for the feedback!


    Thanks for the example!! I just needed to put back in the angle brackets for the inequality and deal with the ampersands and it works like a charm.


  12. I doubt John expected this discussion to drift so far from his original post… {grin}

    I’ve never understood this fascination that people have with (a) hiding errors and (b) cramming what at times seems like all the world’s knowledge into a single cell.

    Addressing the the first issue. Errors are *good.* Stop hiding them. They help people discover problems with the software model. And, with their business (or scientific) model. They also help people discover inconsistencies in their operating environment and their data. Essentially, blindly hiding errors is wrong. Flat out wrong.

    For example, if a particular division leads to a “divide by zero” error, it should be an invitation to find out why that happens, not an excuse to mask the error and pretend everything is OK. Yes, after investigating the error, it may be perfectly legitimate to work around the error but that should be the end of the process not an excuse to flood a worksheet with error hiding formulas.

    The bottom line is I cannot think of a single good designer — of code, of other software model, or of business or scientific or engineering process or in any other discipline — who hides errors.

    And, that brings me to the next point. What’s with this frenzy to cram all the world’s knowledge into a single cell? I dislike this kind of design because it degrades readability and maintainability of a worksheet. A secondary effect is that the performance of that Excel model deteriorates. And, that is particular ironic since most people who use these kinds of put-everything-into-a-single-formula are also among those who get excited with formulas that use 3.14159 fewer CPU cycles and 2.718 fewer keystrokes.

    Look at a simple formula =IF(ISERROR(xxx), “”, xxx). At the very least, maintenance suffers since a change to xxx requires the developer to remember to change both instances of xxx. Also, from a performance perspective Excel has to evaluate xxx twice. It would be so much more elegant to use 2 cells, the first with the =xxx formula, the 2nd with =if(ISERROR(xxx-cell), “”, xxx-cell). Now, maintenance and performance both benefit.

    Yes, there are a few legitimate reasons for complicated formulas. But, by and large, maybe someone can help figure out if
    1) people are confusing soil conservation with cell conservation,
    2) it’s a way for the expert to put the lay person in their place, and/or
    3) it’s a macho thing about “Mine’s bigger than yours.”

    Frivolity aside, the simpler software model, by being more transparent and providing more intermediate results, can provide significantly better business support.

    For an example where sacrificing cell conservation benefits the software model see my posts in http://www.mrexcel.com/board2/viewtopic.php?t=216489

    For an example of improving performance see my suggestions in http://mrexcel.com/board2/viewtopic.php?t=221217.

    For an example of where an overly complex software model masks potential design and implementation bugs and also dramatically degrades business support see my suggestions and comments in http://www.mrexcel.com/board2/viewtopic.php?t=117743.

  13. Yikes. The ampersand is easy enough to deal with, but the loss of the confused me, and it’s my code. I had to go back to my original to figure out what I was trying to do. I hate to clutter up the site, but I want to give a workaround a shot.

    if x  y
    if x “” y
    if x “”y
    if x []y
    if x []y
    if x &gt; y
    if x &gt;y

    this &amp; that
    this “&amp;” that
    this [&amp;] that
    this {&amp;} that
    this  that
    this  that

    Hopefully one of these will work.

  14. John
    nice work on the sort formula, I’m sure I needed that recently

    Zach in 2003 (and XP – maybe others?) you can set errors to be not displayed in printouts on the sheet tab in print preview.

    for iferror (and other simple UDFs) I’ve gone retro and started (re)using XLMs as they are much faster than VBA, heres an iferror one:


    Professional Excel Development has an iferror xll which is the best solution if you can get people to install it.


  15. Tushar –

    “What’s with this frenzy to cram all the world’s knowledge into a single cell?”

    I often wonder about this. I’m not dumb (not TOO dumb) and still my brain overheats when I try to decipher some of these one-cell formulas that people have cobbled together. One-cell is a good design for bacteria, but for us higher life forms, multi-cell is better.

    I’m also struck by the refusal of people to use multiple ranges to allow different arrangements of the same data. The optimum arrangements for a pivot table data source, a chart, and a tabular report are all different (in fact, a tabular report for printed output is different than for on-screen display). Rather than trying to make one arrangement fit all uses poorly, start with a flat table as the primary database. Then use whatever dynamic or static formulas are needed to generate linked ranges for the other purposes. Hide the ranges you don’t want the user to mess with; for example, the printed output need never be visible, as it’s only required when printing the report. Sure, the file’s a bit larger, but I can’t possibly fill my 200 GB hard drive, and the time and effort required to force fit one range into multiple functions is excessive.

  16. totally agree, anyway once you get to the point where you are unable to understand your OWN formula, you take a step back from the one cell formula.

    Actually I am usually avoiding array formula if I can, though this exemple (not the error catching part, the sorting one) is really usefull to me and understandable too.

    on the other part, when you need two dimensional lookup for exemple, you quite often end up with big formulas, and so far I dont know any way around it except for UDF, and I refuse to use UDF as I think it is the most unreliable thing in excel today.

  17. Jon,

    You’ve mentioned this type of multi-table layout here before and I was, and am, interested in seeing an example if you ever have time to put one up on your site.

  18. Jon & Tushar, creating incomprehensible one-cell solutions is all about the challenge of being able to do it, and pushing Excel to its limits. I agree that it’s not always the best way to do things, but I’m more of an Excel hobbyist than someone who actually uses it for useful work.

    And, I’ve found that using a huge incomprehensible “megaformula” (rather than a bunch of intermediate formulas) reduces calculation time and makes a significantly smaller file.

  19. Jon –
    “but I can’t possibly fill my 200 GB hard drive”

    That’s what I said about my 400MB hard drive back in the 90’s… after I last installed XP Pro I checked disk usage b4 installing anything else it was around 2.5GB for a bare install.


    P.S. I miss DOS….

  20. Where else should all the world’s knowledge be if not in a single spreadsheet cell? It is the perfect location for it. Engraved on an atom.
    Word and Access MVPs might disagree of course. What do they know :-)

  21. Simon, that’s exactly what I want. I can live with the errors on screen, but not on final printouts. I can’t believe that was there all along and I missed it. I guess that’s what I get for being an XL2000 holdout for so long.

    jk, does the use of a static variable make it perform faster? I rarely have performance issues with such a simple thing, but it’d be worth noting anyway. I was amused to see you use right(x,len(x)-1) to get the cell value starting from the second position. It seems more elegant than using mid with an arbitrary value, and so obvious too. I guess it kind of goes back to John’s original point about being so stuck on one way of doing something that you somehow block that far simpler solution from popping up.

    Tushar, I don’t disagree about hiding errors, I just disagree on the definition of an error. Sometimes the #DIV/0 would be better replaced with a 100% or (100%) when doing a variance analysis. I know it’s not mathematically correct, but I’m in finance, not physics. Another example would be a failed vlookup. I don’t consider an unfound value to be an error and would appreciate some sort of default instead of that #N/A.

    I do agree with you on the megaformula point. I wouldn’t qualify megaformulas based on length, though. I think a very long sumproduct formula doing a 2 dimensional lookup can be highly readable and wouldn’t call it a megaformula. I see megaformulas more as those that combine different steps of a process into one, and the only reason I see for doing so is to make your spreadsheet hard to audit and I’m sure there’s a lot of not so good reasons people would want to do such a thing. In my experience dealing with auditors (big 4 firms, I might add) a nice spreadsheet is given far too much credibility.

  22. John –

    I know, those megaformulas help to sell books!

    “using a huge incomprehensible “megaformula” (rather than a bunch of intermediate formulas) reduces calculation time and makes a significantly smaller file.”

    As I stated, I hardly care about file size. For most of my programs, the worksheets aren’t too crowded, there’s a mix of formulas and VBA, and more time is wasted waiting for the user to click a button than waiting for a formula to recalc. So I’m not too worried about speed either.

  23. Charles –

    “That’s what I said about my 400MB hard drive back in the 90’s…”

    I suppose if I loaded up the PC with multimedia files, I’d be able to fill up 200 GB. I notice that 200 GB external drives are now down below $100, so that’s not too critical these days.

  24. Tushar: to help you understand-

    I’m modeling reimbursement changes for health plan fee schedules. I’ve got two or more lists which can vary in length and reimbursement value as management dictates. I want changes to flow to the summary, so I keep utilization liked to reimbursement with Vlookups. Often if fees are not on one schedule, but are on the other, that fee gets to play. If neither exists a manual fee has been calculated and that is used. I use the ISERROR to catch the NA from VLookup and lookup in another schedule. I’m sure there a many different ways to do this. I could create a comprehensive list of the reimbursement codes to be looked up, use helper columns to figure the fee to use, then use lookup to pull a text version of the result cell address, make the address values and use text-to-columns to make the text real cell references. This would do away with lookups altogether in the final model. I’m always ready to learn new techniques, though.
    If I really want to suppress an error from printing, I’ll use conditional formatting to change the font color, blending it to the cell color. No doubt errors should be traced to their causes in almost all circumstances.


  25. John:

    “using a huge incomprehensible “megaformula” (rather than a bunch of intermediate formulas) reduces calculation time “

    That strongly depends on the type of calc. I once speeded up a workbook by a factor 4 (!) by splitting up single-cell array formulas into about 5 columns of intermediate calculations. FIlezize of course increased, but it was great to have calc time down from about 8 seconds to 2 (which is still slow).

  26. Whilst I agree with Tushar (Reply #16 October 12th, 2006 at 6:49 pm) that “blindly hiding errors is wrong”, the solution is not to allow errors to occur and then display them as suggested, but to build mechanisms into your model that will handle foreseeable errors and thereby prevent them (eg. By using Data Validation, or VBA).
    In addition, often a formulae may return an error value when the underlying data is not incorrect – just the method of dealing with that data. (eg. Percentage calculations often return “#DIV/0!” errors because of zero value denominators. It is not incorrect to have these zeroes, but it is bad modelling to allow the error value to show – because the error then flows through to any subsequent formulae that are dependent upon that cell. Accordingly, use IF statements to deal with the error so that a zero is returned rather than “#DIV/0!”.
    I believe that errors in models are never good – it means that the model builder hasn’t done enough work to make the model sufficiently robust. Imagine the airline industry if error handling wasn’t part of the design process – we’d have planes falling out of the sky on a daily basis because there were no duplicate systems, or automatic actions like the deployment of oxygen masks, etc.!!

  27. I have a solution that works numbers, texts, duplicate numbers, duplicate texts and blanks.
    However, I am unable to pack it into a single megaformula. I need some extra columns for intermediate calculations. Can it be of any interest?

  28. Suppose that Range A1 to A10 contains


    As you can see, there are numbers, texts, blanks, duplicate numbers and duplicate texts.

    Then key in, in B1, the following formula:
    and copy it down to B10.

    Eventually, in C1 key in:

  29. For any unknown reason, the previous post did not appear complete. Here is what was left:

    Eventually, in C1 key in:

  30. John’s original post DOES work for numeric values! I needed to do what John Walkenbach does, but I needed it for numeric values. My data were in cells F14:F25. I entered the following, and it worked perfectly. …but it still does not work for duplicated numeric entries. Note I used “>” rather than “” &F14:F25),0))

  31. Thank you!
    I came across your post by chance and it solved a problem that has been bugging me for ages (lots of time anyway….). I have made a list of content of DVD where each DVD contained more than one item. I wanted to keep the DVD’s in numerical order and added somewhere a dropdown box which allows me to look up the content and get the DVD number. but it wasnt alphabetized (which sucked). I think now it should work perfectly.

    Now all I need to do is understand what it does :)

  32. I have been reading a lot about excel since 1 year ago, but first time i post something:

    We can eliminate repeated entries with a column with this formula (i have placed it on D4 to D65:


    Original information is stored from A4 to A65. Then we create a new column without repeated values, in this case with the results on E4 to E65:


    Then, apply to E4 to E65 a formula to sort the text, and the result is without repeated entries.

    Hope im clear, if you have doubts contact me at alberto.enciso , from gmail.

    Greetings to all you.

  33. I’ve been finding this post tremendously interesting and wanted to find further support with a query along similar lines, as I want create a multi-cell array formula that will sort a column of % values and hour values.

    A1 Colleagues
    Range A2:A21 is named students and each cell has a student name.

    A2 Hours
    Range B2:B21 is named hours and each cell has a hour value based on a 7 hour study time a day. Values ranged from 1 to 147.

    C3 Percentage
    Range C2:C21 is named percentage and each cell has a value that scores the students latest evaluations. Values ranged from 1 to 100%.

    What I would like to know is this:

    1) Is it possible to have the data in the student & hours and student & percentage ranges sorted from top to bottom so I can produce a chart tracking performance?

    2) Also, if some students have no data in hours or percentage how can I have the data in the student & hours and student & percentage ranges sorted so that those values appear at the bottom and do not appear the chart?

    Is this possible?

  34. I hope to get help in my work. I wanted to save time in manually checking duplicate names in a 7,000 datebase of names I received on a bi-weekly basis i.e.,in an excel format. I’ve started reading excel books including that of john Walkenbach, trying to find a simple way in excel to do it but can’t figure it out …

  35. Vicky Ternida,

    1. Use Excel Sort ; then, duplicate names are grouped in one place.

    2. Use Excel Conditional Format ; then, duplicate names are highlighted thereof in respective groups.

    3. Use Excel AutoFilter ; then, you can examine respective groups of duplicate names in turn.

    4. Recurring names like J.R._Junior, J.R-Junior, JR.Junior etc are admittedly more difficult to detect as duplicate names.

    5. If you would allow limited disclosure, please send a copy of your Excel worksheet of 7000 names (truncated if need be) to,

    6. However, before I could offer any assistance at all, please exercise discretion (by your good-self) while sharing the list …… it should not incur any infringement of interests upon the parties involved.


  36. The formulation is really great!!!

    =INDEX(Data,MATCH(ROW(INDIRECT(“1:”&ROWS(Data)))-1,COUNTIF(Data,”<” &Data),0))

    Thanks …

  37. Hi,

    I tried to use the formula but it brings the first entry in the range “data” always when i copy to extend the formula in the column “sorted” for the remaining rows.

    What might be the reason behind it?

    thks in advance,


  38. Awesome! I’m impressed.


    It works great, and I’ve also taken some inspiration from the suggestion of PaulB to get it working with numbers and other characters. I’m using it to create dynamic range names, using the formula below, in order to create validation lists.



    Now I have the next problem… (Never satisfied I know).

    I have an initial list that is created as a formula.
    The formular is a simple concatenation of additional other data wrapped up in an IF statement.

    =IF(OR(ISBLANK(H93),ISBLANK(I93)),””,H93 & ” (” & I93 & “)”)

    Now, when I try to created a sorted list it doesn’t create a list at all.
    The problem with the first line was the ISBLANK statement which I modified to ($G$6:$G$105=””) to get it working, see below.


    I then tried the same for the following lines which are also wrapped up in an IF statement to avoid the unwanted #NA’s. Here my solution doesn’t work. The formula is below.


    So, with a bit of debugging, I’ve tracked the problem to the section of the formula reading ” COUNTIF($S$6:$S$105,”<“&$S$6:$S$105) “. This formula returns 1 when all blank cells are deleted (empty) and a larger number when the blank cells still contain the formula (with empty string set).

    How do I solve this?


    Can someone explain me what the array formula is actually doing here to point me in the right direction to solving this. I simply don’t understand the logic!



    1) The first line writes this statement with an additional &”” appended to the criteria. No change.

    2) Setting the formula column to an array formula created more problems to try and deal and didn’t seem like an appropriate or effective solution to I quit this line of tests.

    3) Considered using a macro to run the concatenation, but seemed pointless having spent so much time getting this formula working so as to avoid macro warnings for the user.

    4) Considered dynamic validations to avoid the need for concatenation to begin with but that doesn’t suit my application.

    5) … a bunch of other things that I don’t think of now.


    Any help or pointers would be appreciated. This is a seriously cool formula and application I’ve found to use it, just need to tweak this last little bit :-)



  39. Okay, maybe I over complicated the problem – same question in a different wording…

    I modified the first line formula to deal with list cells containing empty strings (from IF statement). The same modification to the supplimentary formulas doesn’t work.

    1) Is there a better way to deal with empty strings since you can’t set the cell to a true blank as it contains a formula?

    2) Can someone explain the logic of the formula below. I don’t follow it at all.


    Thanks again


  40. I find the original formula not to work for me. I believe the problem lies in this portion “INDIRECT(“1:”&ROWS(data))”. If I change it to “INDIRECT(ROW(A1)&”:”&ROWS(data))”. It then works.

    Below is the formula that works for me.


  41. My solution is much simpler.

    1. My unsorted numbers ( or words ) are listed horizontally. e.g. B29 – G29 ( 6 numbers ). I choose 29 so that it wont be confused with the 1 used in RANK function :D

    2. My sorted numbers shall be in cells J29-O29.

    3. The formula for cell J29 is

    3. The formula for cell K29 is … just convert all the “=1″ into “=2″

    4. The formula for the rest is “=3″ for L29 and so on till “=6″ for O29.

    5. The RANK function will rank every cell in the range. There will not be any unranked.

    6. The last part .. ,IF(RANK($G29,$B29:$G29,1)=1,$G29,$Q29)
    If there are more than one same number .. meaning there are more than one number of the same rank.. it would duplicated the first number of the same rank

    7. To sort words, you need to first convert words into ASCII using the CODE function. The RANK function works only with numbers.

    Hope this would help u guys.

  42. Hi
    I did use the formula that =INDEX(data,MATCH(ROW(INDIRECT(“1:”&ROWS(data)))-1,COUNTIF(data,”< ” &data),0))

    but it appear to be come out with #name? for C:C28

    my data is D2:D28

    My formula

    =INDEX(D2:D28,MATCH(ROW(INDIRECT(“1:”&ROWS(D2:D28)))-1,COUNTIF(D2:D28,”< ” &D2:D28),0))

    I am not sure what when wrong.

    I select C2:C28

    paste that formula and Control Shift Enter and the whole C2:C28 become #name?

    Please help

  43. 7.75 years since JW’s original Post – I can see a very good use for this array and that is to sort the entries for a drop down list, where the entries come from user input. Great work JW – you are still the MAN!

  44. Thanks John! This worked perfectly, and really helped me out of a jam. I’m a really big fan of what you do. Keep sharing the Excel love! :)

Leave a Reply

Your email address will not be published. Required fields are marked *