Naming Table Columns

I was reading Contextures post about duplicate entries. In it, she creates a named range that refers to a Table column. That named range will expand and contract as the Table does.

Tables have their own built-in names and they are so handy that I find myself using Tables whenever possible. Here’s me using the table name and an open bracket to get a list of columns while entering a formula.

You may have guessed that I prefix all of my Tables with “tbl”. I also use the singular noun, so it’s tblCustomer and not tblCustomers. One of my favorite uses of Table references is using them in VLOOKUPS. The third argument to VLOOKUP is the column from which to pull. I’ve never been a fan of making that dynamic. Instead, I would hard code the number and change it if the columns change. But with Table references, I find myself making that argument dynamic. It’s a little less error prone, a lot more self-documenting, and while it’s more typing, I don’t have to count columns. An old VLOOKUP of mine might look like

=VLOOKUP(G1133,$A$1:$J$331,2,FALSE)

And now it looks like this

=VLOOKUP(G1133,tblCustomer,MATCH("Carrier",tblCustomer[#Headers],FALSE),FALSE)

Instead of hard coding “2” for the column, I’m hard coding “Carrier”, which is the column name. Obviously that could change, but it will give me an #N/A! rather than pulling wrong information.

The autocomplete feature of the table references makes this easy to enter. I can type “tbl” and get a list of all the tables in my workbook.

Then I can hit the tab key to finish the job. Next, the open bracket gives me all the elements in the table, including #Headers.

But this was supposed to be about naming columns and I got off on this VLOOKUP thing. One of the limitations of Table references is that you can’t use them directly in data validation and conditional formatting. The way around that, as Contextures employs, is to create a named range that refers to the column. I use a convention when naming Table columns.

The name is prefixed with “dv” (for data validation). Then the table name without the “tbl” prefix. Then the column name. The big advantage to me to using this convention is that it’s easier to type the named range in the Data Validation box. There’s no intellisense in the Data Validation box, so you have to remember the named range exactly. Also, when you’re entering a formula in the formula bar, it’s nice to type “dv” and see all of the named ranges that start with that. Or to type “dvCustomer” to see all of the columns in that table that have a named range.

One of the downsides is that the named range can get long. If you have tblCustomer that has a CustomerName column, the named range becomes dvCustomerCustomerName. I think the benefits outweigh the costs.

Leave a comment with your conventions for naming ranges.

Calculating the Number of Unique Items in a Delimited String

This is a guest post by David Hager. You can learn more about David on LinkedInd

There has recently been interest in handling row data that contain delimited strings. An elegant solution to split a delimited row into multiple rows using DAX queries can be found here.

http://www.sqljason.com/2013/06/split-delimited-row-into-multiple-rows.html

Another question was asked on LinkedIn about how to count the number unique items of a delimited string. I could not figure out how to do this in PowerPivot, and I am not sure it can be done with a single formula. However, I undertook the challenge to do this in Excel and I came up with a solution, albeit a rather lengthy one.

So, if the string a,c,d,a,b,c,d,e,f,g,h,I,j is in cell A1, the following formula will return the value 10, which is the number of unique items in the delimited string.

=SUM(N(MATCH(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)) ,ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))*999-998,999)),TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)) ,ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))*999-998,999)),0)=ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))))

In order to explain how this formula works, a simplified version is shown below:

=SUM(N(MATCH(DelimitedStringArray,DelimitedStringArray,0)=RowArray))

The MATCH function returns an array of n elements. Each value in the array is the MATCH function result for the nth element. For the example string, the array will look like this:

{1;2;3;1;5;2;3;8;9;10;11;12;13}

If every element in the delimited string is unique, this array would be filled by consecutive numbers from 1 to 13. It can be easily seen which elements do not fit that pattern. In order to calculate this, the array from the MATCH function must be set equal to the unique array, or:

{1;2;3;1;5;2;3;8;9;10;11;12;13} = {1;2;3;4;5;6;7;8;9;10;11;12;13}

which affords an array of Boolean values:

{TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

Use of the N function converts this array into ones and zeros, and the SUM function returns the desired result.
Well, that was the easy part. :)

The hard part of this formula is to convert a delimited string into an array of each element in the delimited string. This was accomplished by using the following formula:

=TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))*999-998,999))

To give credit where credit is due, the core of this formula was created by Rick Rothstein, see:

The truly amazing function of this formula is that it converts a delimited string into an array! I’m not going to go into an explanation here of how this formula works, since it is explained at the provided link. The original formula

=TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),n*999-998,999))

was designed to return the nth element from a delimited string, but in this case all of the elements are returned in an array by replacing n with ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1)), which in this example returns {1;2;3;4;5;6;7;8;9;10;11;12;13}. This is also the array used for RowArray in the simplified example.

So, this formula works great in Excel, but how could it be used in PowerPivot? For those using Excel 2013, and assuming that your column of delimited strings resides within a table in your DataModel, you can use a DAX query to bring the column into Excel, add the formula demonstrated here to the Excel table, and link the table back into the DataModel. A comprehensive example of this can be found at:

http://www.sqlbi.com/articles/linkback-tables-in-powerpivot-for-excel-2013/

So, for situations where solutions may not be possible in PowerPivot (or just incredibly complex), don’t forget about the power of Excel formulas.

BTW, an offshoot of the creation of this formula is another array formula that sums (or averages, whatever) the values in a delimited string:

=SUM(N(VALUE(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)) ,ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))*999-998,999)))))

A solution to this problem has been pursued in the Excel community for many years, so I am happy to present this formula here.

Averaging Best 4 of 5 Golf Scores

Here’s part of a spreadsheet for a golf league.

The formula in AA is just an AVERAGE(), but the one in AB is a little more interesting. It looks like this:

{=IF(COUNT(B2:Z2)<5,AA2,(SUM((Z2:INDEX(A2:Z2,LARGE(COLUMN(A2:Z2)*(A2:Z2<>""),5))))-LARGE((Z2:INDEX(A2:Z2,LARGE(COLUMN(A2:Z2)*(A2:Z2<>""),5))),1))/4)}

This formula takes the last five scores, throws out the highest one, and averages the remaining four scores. If the player has less than five scores, it’s just a simple average. I did some similar calculations in Golf Handicap and Sum Last Three Golf Scores if you’re interested. Let’s break this one down.

Array Formula

First, this formula is an array formula. When you enter it correctly, using Ctrl+Shift+Enter rather than just Enter, Excel puts the curly braces around the formula. If you don’t enter it correctly, Excel will still usually return a result, just not the right result. So please enter it correctly. Stuff like this is why Enron doesn’t have a golf league anymore.

Some formulas act on numbers. 1+2 = 3 takes two numbers and adds them together. Array formulas act on arrays of numbers. {1,10,100}+{2,20,200} = {3,30,300}. As you can see, when the arrays are added, each element of the array is added to its brother and the result isn’t a number, it’s an array of numbers. If you want to see a cleaner example of an array formula and how they work, go read Anatomy of an Array Formula.

The IF Part

{=IF(COUNT(B2:Z2)<5,AA2,*Some Array Stuff*)}

The IF part is pretty straight forward. The COUNT function returns how many numbers are in the range to see if there are enough scores. If there are, do the array magic. If there aren't, get the average that we've computed in AA2.

The SUM Part

(SUM(*Stuff that returns an array*)-LARGE(*Stuff that returns an array*,1)/4

Remember when I said that adding arrays returns an array. The same is true for multiplication. The purpose of an array formula is to return an array into a function that does something with an array. We don't want to get an array back from our formula, we want to get an array to plug into the SUM function so that we can add it up and get one number back.

In this part of the formula, we're summing up the array, subtracting out the 1st largest value (LARGE(x,1) returns the largest value in x, while LARGE(x,3) returns the third largest value.), and finally dividing the whole mess by 4 to get an average.

The INDEX Part

Z2:INDEX(A2:Z2,*Some array stuff that returns a column number*)

Cell references that span more than one cell look like B2:Z2, with a colon separating the first cell in the range with the last cell in the range. What you may not know is that Excel doesn't really care which order they're in. To Excel, Z2:B2 is the same as B2:Z2. Go try it. Go type =SUM(Z2:B2) in a worksheet and see what happens.

In this part of the formula, we're starting with Z2, then a colon. The INDEX function is going to return a cell reference that makes up the second part of our multi-cell range. For instance, on line 3 in the above screen shot, the INDEX function is going to return K3, so that when Excel evaluates the formula in AB3, it will reduce this part to Z3:K3 and then convert that to K3:Z3. K3:Z3 happens to be the range that covers Andrew's last five scores.

With INDEX, you supply a range and a number that says which cell in that range you want. In our example, we're supplying the range A3:Z3 (the entirety of the scores) and we're doing some calculations that return which cell in A3:Z3 we want to be the Z3's partner.

The LARGE Part

LARGE(*the array stuff*,5)

In The SUM part, we talked about how LARGE works because we were trying to find the largest number in an array to get rid of. Here we're using LARGE in a different part of the formula. We're finding the 5th largest value in some array. Why fifth? Because we want five golf scores. This is the part that, in line 3, helps us find K3 so we can sum up K3:Z3.

The array that we're getting the 5th largest value from is an array of column numbers that have non-blank cells. Of all the cells on line 3 between A and Z, K3 is the fifth largest column number (K=11 if column letters were actually numbers) if you only consider cells that aren't blanks. N3 would be the fourth largest and P3 the second largest, to name a few.

The Array Part

COLUMN(A2:Z2)*(A2:Z2<>"")

Finally the good part. This is the part where we multiply two arrays together and the part that makes this formula an array formula rather than the normal kind. We've kind of worked from the outside in on this formula, but I'm going to walk back from the inside out so we don't lose perspective on this part.

The array part returns an array of column numbers. That goes into the LARGE function so we can find the fifth largest column number that has a value. That cell gets married to Z3 to make a multi-cell range. The range gets put into a SUM function that adds up the five scores in the range.

You may notice that the array part is repeated in the formula. It's in the SUM part so we can get to the five values. It's also in the LARGE part so we can pick out the largest of those five values to discard.

I mentioned earlier that we can multiply arrays and get an array in return. We're multiplying two arrays that each have 26 elements (A-Z is 26).

COLUMN(A3:Z3)*(A3:Z3<>"")

As arrays, the above looks like this:

{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}*{TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE}

The COLUMN function returns the column number for the range, 1 to 26 in this case. The second array is created by comparing each cell in A3:Z3 to an empty string, thereby testing whether the cell is empty. If it's not equal to an empty string, that element of the array gets a TRUE. If it is empty, it gets a FALSE. We end up with 26 TRUEs and FALSEs telling us which columns have values.

Now here's the magic. When you do math on a TRUE or FALSE, it converts into a number; TRUE becomes 1 and FALSE becomes 0. When we multiply each element of the arrays to each other, remembering how TRUEs and FALSEs are treated, we get

{1,2,3,4,5,6,0,0,9,10,11,0,0,14,15,16,17,0,0,0,0,0,0,0,0,0}

Every column that had a value gets the column number. Every column that had no value gets a zero. We have a 26 element array with column numbers and zeros. This array gets jammed into a LARGE function that returns the fifth largest value. What's the fifth largest value in this array? It's 11.

The 11 gets jammed into and INDEX function to return the 11th cell of A3:Z3, which is K3.

K3 gets tacked onto Z3 to make Z3:K3 and that gets converted to K3:Z3.

K3:Z3 gets jammed into a SUM function and returns 40, the sum of the last five scores. Elsewhere, K3:Z3 gets jammed into a LARGE(x,1) function to return the largest value in K3:Z3, or 14.

We take the sum, 40, subtract the large, 14, and get 26. Divide 26 by 4 and we get 6.5, the number in AB3.

And that's it. Now you know about array formulas, INDEX, LARGE, COUNT, and COLUMN. I assume you already knew about IF and SUM. Do you get how it works? OK smart guy. What do you need to change to get the best 4 out of 6?

Win your own SpeedTools Coffee Mug

SpeedTools Mug

Some of you will know that I have been developing (for the past 2 or more years!) some fast multi-threaded Excel functions using the XLL C++ interface.
After climbing this rather steep learning curve I am now very close to the final release of these functions.

FastExcel SpeedTools is designed to be a state-of-the-art set of tools to help you speed up calculation of slow workbooks:

  • New calculation methods and modes give you greater control of calculation.
  • Superfast memory lookup and Compare Lists functions make handling large data volumes easier.
  • You can eliminate many SUMPRODUCT and array formula bottlenecks with SpeedTools FILTER.IFS.
  • Additional functions include Regular Expressions, Array Stacking, Array OR/AND, Text and Information functions

The final SpeedTools Beta test is now live and I need more Beta Testers and feedback:

So download SpeedTools Beta 3 and tell me what you think!

The best 20 Beta test feedback reports received by the end of March will get their own exclusive SpeedTools  coffee mug, (and a free license of course!).
Excel 2003 users get a toolbar to choose functions and launch the Function Wizard, and Excel 2007 and later users get 2 additional groups on the Formulas tab.

SpeedToolsFuncLib

You will find more information on my website and blog, including some performance comparisons with the built-in Excel functions.

 

 

Handicap History

Back in 2005, I created a spreadsheet to calculate my USGA handicap and I’ve been using it ever since (175 scores posted). It calculates my current handicap but doesn’t give me any historical information. Until now.

GHIN calculates handicaps on a schedule, not in real time. That means many of the handicaps on my spreadsheet aren’t official. They do that so that one score doesn’t swing the calculation too much. By calculating approximately monthly, they give you a chance to post a few scores between calculations. Here’s what my spreadsheet looks like today:

The formulas up to column J are the same as they were in 2005. I’ve added formulas in K through O.

K181: =SUMPRODUCT(((H162:H181+(ROW(H162:H181)/100000))<=SMALL(H162:H181+(ROW(H162:H181)/100000),10))*(H162:H181))

The part that says +(ROW(H162:H181)/100000) is used to add a small number so that I can break ties but won't affect the score. Otherwise it's summing up the 10 smallest of the last 20. I started 20 rows down so I wouldn't have to mess with less than 20 scores. That's too much work for not enough payoff.

L181: =SUMPRODUCT(--((H162:H181+(ROW(H162:H181)/100000))<=SMALL(H162:H181+(ROW(H162:H181)/100000),10)))

I don't know why I wrote this one - it's always 10. Except that sometimes it was 11 before I added the +ROW stuff to break ties. It's principally the same formula as the previous one except for the last part. And it includes the double-negation to coerce the Trues and Falses into numbers.

M181: =TRUNC(K181/L181*0.96,1)

Divide one by t'other and take 96%.

N181: =IF((M181+(ROW(M181)/100000))=MIN(IF(((YEAR($D$27:$D$181)=YEAR(D181))*($M$27:$M$181))=0,"",($M$27:$M$181+(ROW($M$27:$M$181)/100000)))),M181,NA())
O181: =IF((M181+(ROW(M181)/100000))=MAX((YEAR($D$27:$D$181)=YEAR(D181))*(($M$27:$M$181+(ROW($M$27:$M$181)/100000)))),M181,NA())

Once I got that part done, the data was begging to be graphed. Once on a chart, I could see that I needed some Min's and Max's to clutter it up. These last two formulas compute the minimum and maximum indexes by year and return that value on the row where it exists. If it doesn't exist on a particular row, it returns NA() so it doesn't show up on the chart.

To make the chart, I selected all the data from D:O and created an XY chart. Then I deleted all the series that I didn't want. Sorry for the 2003isness of the chart. Some of the steps I took to create this chart are:

  • CumIndex: Remove markers, make Olive Green line, add Polynomial2 trendline
  • Trendline: Make 25% grey
  • Major Gridlines: Make 25% grey
  • Min: Remove lines, make makers green, make data labels = Y values, put data labels on bottom
  • Max: Remove lines, make makers red, make data labels = Y values, put data labels on top
  • Y axis: Make minimum of 4 and maximum of 16 to get rid of some white space
  • X axis: Adjust min and max to tighten it up
  • Plot area: Remove fill

Oh, how I pine for the days of a 7.5 handicap index. The season in Nebraska ends November 15, so I have a little more than two months to get that down. I'd like to get it down to, at most, an 8.0 by then.

You can download Handicap2.zip

Model business dependencies in an Excel worksheet

The Civilization game dependency tree
How to build a dependency tree when there is no obvious connection between the business model and an Excel model.

A long time ago I spend a lot of time playing the strategy game Civilization. For those not familiar with the game, one of its features was “advances.” Each advance brought with it certain additional capabilities and benefits. Of course, there was a requirement before one could acquire an advance, particularly a set of pre-requisites. For example, pre-requisites for the “Navigation” advance were the “Seafaring” and the “Astronomy” advances. In turn, the Seafaring advance required one to already know “Pottery” and “Map Making”.

I created an Excel worksheet that let me use Excel’s Precedent arrows to understand the optimal path to specific advances as in Figure 1. The advances are shown in red and the benefit(s) of each advance are in black, blue and green. The Excel blue arrows show the pre-requisites for the Seafaring advance.

Figure 1

While the dated worksheet may be of limited value even to Civilization enthusiasts, the technique for creating the dependency tree is unique enough to be of value to Excel consumers. What makes it of value is that the game dependencies (in the context of work this would be the dependencies in the business model) had nothing to do with what Excel considers as dependencies! Consequently, to use Excel’s Trace Precedents feature I had to somehow map the model dependency into Excel formula dependencies.

For a version in a page by itself (i.e., not in a scrollable iframe as below) visit http://www.tushar-mehta.com/publish_train/xl_vba_cases/0908%20CIV%20game%20dependency%20tree.shtml

Tushar Mehta

Working With Circular references in Excel

Have you ever experienced the dreaded “Circular reference warning” popping up when you opened an Excel file or entered a formula?

Excel detects a circular reference when a chain of calculations visits the same cell more than once. Many users get (very) confused by this message and have no idea what it is about. I’ll tried to demystify that message in a new article:

Working with Circular references in Excel

The article contains these chapters:

Types of circular references

Simplistically speaking there are only a few types of circular references to discern:

Deliberate circular references

Accidental circular references

Calculation settings

If you want to work with circular references, the calculation settings of Excel are very important. This page gives you some pointers!

Properly setting up circular references

Whereas I am no fan of using circular references, they can be beneficial to your model and really solve the problem you are trying to solve. So here is some advice on how to properly work with them.

Reasons why circular references may not be detected

This page shows a couple of reasons why circles are not detected.

Enjoy!

Regards,

Jan Karel Pieterse

www.jkp-ads.com