String manipulation with worksheet formulas – Part 1

Deutsche Version / German version

A very common task is to extract a specific element (sub-string) from a longer string values. This first part shows some functions to extract the numeric part of a string while the next part will concentrate on getting the nth substring.

So let’s start with an assumption: The string to be processed does not contain 2 or more numeric parts but only one. e.g.

“This is part 123 of 234 parts”

is not allowed. Though

“This is part 123 of all parts”

is a valid string for the following functions.

First we try to evaluate the starting and ending positions of the numeric part. But before we can start using the formulas we have to define a name (Menu ‘Insert – Name – Define’) to make them more readable:

  • Name: seq
  • RefersTo Formula: =ROW(INDIRECT(“1:1024”))

We start with evaluating the beginning and ending position of our numeric part. For our example

“This is part 123 of all parts”

we should get 14 and 16 as result. Formulas to achieve this result are:

  1. Getting the starting position of the numeric part:
    • Array formula: =MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq))
    • Non-array formula: LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq)
  2. Getting the last position of the numeric part:
    • Array formula: =MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))
    • Non-array formula: =LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)

To extract the numeric part all we have to do is combine the above functions together with the MID function:

  • Array formula:
    =IF(MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),–MID(A1,MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))-MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq))+1),””)
  • Non-array formula:
    =IF(ISNUMBER(LOOKUP(2,1/MID(A1,seq,1),seq)),MID(A1,LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq),LOOKUP(2,1/MID(A1,seq,1),seq)-LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq)+1),””)

I have to admit these formulas do not look simple and of course you can achieve the same goal with some VBA code quite easily. But in case you can’t use VBA this provides you with a non-macro alternative.
If performance is an issue you may consider using the following array formula (nearly twice as fast than the other two formula combinations):

=–MID(A1,MATCH(TRUE,ISNUMBER(-MID(A1,seq,1)),0),ROWS(seq)-SUM(–ISERROR(-MID(A1,seq,1))))

The above formulas can be simplified if you can make some additional assumptions about your string value:

  1. Numeric part is always at the beginning of your string:
    • Array formula:
      =IF(MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),–LEFT(A1,MAX(IF(ISNUMBER(-MID(A1,seq,1)),seq))),””)
    • Non-array formula:
      =IF(ISNUMBER(LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)),–LEFT(A1,LOOKUP(2,1/(ISNUMBER(-MID(A1,seq,1))),seq)),””)
    • Or a very nice non-array alternative (I think first posted by Aladin Akyurek):
      =–LEFT(A1,SUMPRODUCT((LEN(A1)-LEN(SUBSTITUTE(A1,{“.”,0,1,2,3,4,5,6,7,8,9},””)))))
  2. Numeric part is always at the end of your string:
    • Array formula:
      =IF(MIN(IF(ISNUMBER(–MID(A1,seq,1)),seq)),–MID(A1,MIN(IF(ISNUMBER(-MID(A1,seq,1)),seq)),20),””)
    • Non-array formula:
      =–MID(A1, LOOKUP(2,1/MID(A1,1024-seq,1),1024-seq),20)

This is only a compilation of some common used worksheet formulas for extracting numbers from strings. You may explore these functions in your own spreadsheet.
But keep in mind: For complicated string processing you may switch to VBA (together with using regular expressions).

Frank Kabel

Pages: 1 2

Logical operations in array formulas

Deutsche Version/German version
Array formulas (committed with CTRL+SHIFT+ENTER) have one restriction: You canĂ­t use Excel’s logical operations AND, OR, etc. within an array formula. So for example the following formula to conditional evaluate the minimum value of column C is not a valid array formula:

=MIN(IF(AND(A1:A10=”value1″,B1:B10=”value2″),C1:C10))

But you can replace these logical operations in array formulas. Use the following mathematical approaches to mimic the logical functions:

  1. AND statement:
    • Normal usage: AND(condition_1,condition_2)
    • Within an array formula: (condition_1) * (condition_2)
  2. OR statement:
    • Normal usage: OR(condition_1,condition_2)
    • Within an array formula: ((condition_1) + (condition_2)>0)
  3. NOT statement:
    • Normal usage: NOT(condition_1)
    • Within an array formula: (1-condition_1)
  4. NAND statement: This is the negative AND statement. It returns TRUE if neither of the conditions or only one condition are TRUE:
    • Normal usage: NOT(AND(condition_1,condition_2))
    • Within an array formula: ((condition_1) + (condition_2)<>2)
  5. XOR statement: Returns TRUE if only one of the conditions is met:
    • Normal usage: N/A
    • Within an array formula: MOD((condition_1) + (condition_2),2)=1

So using the first formula as example you can use the above approach as follows:

=MIN(IF((A1:A10=”value1″)*(B1:B10=”value2″),C1:C10))

to create a valid array formula.

Pages: 1 2

Getting the last value

Within Excel there is not a single formula to get the last element within a column or a row, but it can be achieved by a combination of worksheet functions within a variety of differing requirements. The fact that this is achieved in Excel by combining a number of functions can be seen as a strength of Excel (a strong base set of functions that provide a large degree of extensibility), or a weakness (lack of commonly required functions), depending upon your viewpoint.

The following is an extract of a comprehensive White Paper Bob Phillips and I created together. If you want more details about getting the last value including alternative solutions, benchmarks, VBA coding, etc. you may take a look at it.

  1. Getting the last numeric value in a column:
    • Formula: =INDEX(A:A,MATCH(9.99999999999999E307,A:A))
    • The INDEX/MATCH formula searches for the value 9.99999999999999E307. This value is the highest value that can be represented in Excel. Therefore this formula returns the last numeric value that is smaller than or equal to this number.
    • If no numeric entry exists within the range, the MATCH formula will return the #N/A error.
  2. Getting the last text value in a column:
    • Formula: =INDEX(A:A,MATCH(REPT(“Z”,255),A:A))
    • The INDEX/MATCH formula uses a string consisting of 255 ‘Z’ characters to find the last text entry. For Excel, this string evaluates to the ‘largest’ string value.
    • You cannot use the function REPT(CHAR(255),255), as the largest value. Whilst you might suspect that Excel evaluates this to the ‘largest’ string value, Excel evaluates the following formula:
      =REPT(“Z”,255)<rept (CHAR(255),255)
      to FALSE
  3. Getting the last value of any type in a column:
  4. FK_LastValue_1

    • Formula: =LOOKUP(2,1/(1-ISBLANK(A1:A65535)),A1:A65535)
    • This formula uses LOOKUP in its vector syntax form, with the lookup value as the first parameter, the lookup vector as second, and the result vector as the last parameter.
    • The most interesting part of this formula is the lookup vector (the 2nd parameter). The formula element
      1/(1-ISBLANK(A1:A65535))
      in this example returns the following array
      {1;1;1;1;1;1;1;#DIV/0!;1;1;#DIV/0!;#DIV/0!;…;#DIV/0!}
      that is, the ISBLANK function returns an array of TRUE (blank cell) or FALSE (non-blank cell) values. Subtracting this from 1 converts the array to an array of 0 (blank) or 1 (non-blank) values. Dividing 1 by this array then returns an array of #DIV/0 (blank) or 1 (non-blank) values.
    • The LOOKUP searches for the value ‘2’ within the array (which now consists only of ‘1’ and #DIV/0 values). The LOOKUP will not find this value, so it matches the last value that is less than or equal to lookup value. This is the last ‘1’ within the range which represents the last filled cell.
    • Restriction: you can’t use a complete column reference such as A:A for this type of formula.
    • This type of formula can be used for a lot of similar problems using the second parameter to create a lookup vector consisting of either ‘1’ or ‘#DIV/0’ errors by setting the Boolean expression accordingly. I personally saw this usage first in a posting from Aladin Akyurek.

All these formulas can easily adapted for searching in a row by changing the range reference. For examples see the above mentioned White Paper.

INDIRECT and closed workbooks

As Stephen I also like to thank Dick for opening his blog to others. So I’ll take the chance to cover some medium to advanced formula issues in my postings.
Excel provides a very powerful function – INDIRECT. It just lacks the functionality to access closed workbooks. The following is a compilation of common alternative solutions presented in the Excel newsgroups:

  1. Laurent Longre has developed the free add-in MOREFUNC.XLL which includes the function INDIRECT.EXT
    • You can use it in the same way as INDIRECT. e.g.:
      =INDIRECT.EXT("'C: emp[book1.xls]sheet!'A1")
    • Note that in this formula you also specify the path/directory information
    • If you use INDIRECT.EXT with an open workbook it behaves the same way as INDIRECT does

    There’re some limitations to this function:

    • It does not work reliable on all computers. So you have to try it in your individual environment.
    • INDIRECT.EXT can only return a single cell reference from a closed workbook. So you can’t use it for example as second parameter in a VLOOKUP function:
      =VLOOKUP("search_text",INDIRECT.EXT("'C: emp[book1.xls]sheet!'A1:B20"),2,0)
      won’t work.
    • INDIRECT.EXT does not work with defined names within closed workbooks.
    • If you have to access several closed workbooks your spreadsheet can become slow while re-calculating.
  2. Use SQL.REQUEST:
    • The usage is described here.
    • This function is relatively slow and the data has to be organized in a database like structure (that is a single area with field names as top row).
    • Note: Microsoft does not support this addin anymore.
  3. Use Harlan Grove’s PULL function:
    • Code can be found here.
    • The function creates separate Excel application instances to “pull” data from closed workbooks.
    • This function is more robust, can deal with non-database like layouts and can also deal with cell ranges.
    • Example usage:
      =VLOOKUP("search_text",PULL("'C: emp[book1.xls]sheet!'A1:B20"),2,0)

All these solutions have one common drawback: They’re quite slow. So use them carefully and don’t use them with large cell ranges.