Count Active Customers

Jake wants to know, given an active date and an inactive date, how to count the customers that were active in a certain time period.

The ones we want are highlighted in yellow. The formula is

=COUNT(D2:D21)-SUMPRODUCT(($C$2:$C$21>=D25)+($D$2:$D$21< =C25))

It’s easier to figure out who is not active during that date range and subtract it from the total. The formula starts by counting everyone using the COUNT function. The SUMPRODUCT function is then subtracted from that. It gives the total of all the customers who became active after our End date plus the customers who became inactive before our Start date.

Note that this formula excludes the actual Start and End date. Customer 17 isn’t included because we’re really looking at 7/16/2010 to 7/24/2010. If you want the formula to be inclusive, simply change the <= and >= to < and >, respectively.

If you don’t like SUMPRODUCT, you can get there with COUNTIF

=COUNT(D2:D21)-(COUNTIF(C2:C21,”>=”&D25)+COUNTIF(D2:D21,”< =”&C25))

Sometimes it’s easier to turn the problem around and figure out who’s excluded.

Posted in Uncategorized

11 thoughts on “Count Active Customers

  1. subtracting is definitely the way to go… but I’ve happily cut my bad habit of combining arithmetic & logical expressions in the same Excel formula ever since Excel 2007. Ever since, I’ve used the functions countifs, sumifs, and averageifs which are much more intuitive.

  2. If some cells in C2:C21 contained 7/25/2010, the formulas offered would include them as 1 in the SUMPRODUCT result. In that way the active period is treated as >= C25 but < C26. So the date labeled End is exclusive. This would not be what a typical user would expect (they’d expect bothstart and end dates to be inclusive).

    If Start and End dates were inclusive, it’s sufficient to test that the Active Date values fall between them. The Inactive Date values are irrelevant as long as Inactive Date values are strictly larger than Active Date values. And with moderate trickery this requires only 1 pass through the array.

    =SUMPRODUCT(–(ABS(C2:C21-(C25+D25)/2)=(D25-C25)/2))

  3. DK:

    1) Please consider posting the data in a table or attach a file. It’s a PITA to recreate the data set! :(

    2) As long as active date is always < = inactive date, define an active customer as active date < = end date (use < to exclude the start date) and inactive date > = (or > ) start date. This, of course, is the opposite of the definition of an inactive customer, which is what you used.

    3) I would use a column to flag a customer as active / inactive. It serves multiple purposes:
    * Sum that column to get the total number of active customers.
    * Use the flag as the conditional format rule. This makes the c.f. and the numeric totals reference the same criterion/criteria.
    * Finally, it’s an easy way to validate the formula.

    fzz:

    That’s a nice way to check if a number is within a range. :)

    There’s a difference in definitions. You’ve defined a customer as active if the active date is between the start and end dates. But, from DK’s post and my experience in such scenarios, the more common definition is a customer is active if

    s/he became active before the start date and was active on the start date
    or s/he became active after the start date but before the end date

    To see that your definition does not match Dick’s, check row 3. (C25+D25)/2 is 7/20/2010. (D25-C25)/2 is 5. So, C3 – 7/20/2010 is 16 which is not < = 5. But, row 3 should be included in the active count.

  4. I discussed a similar problem a couple of months ago:
    http://roymacleanvba.wordpress.com/2010/06/08/sub-sequence-iterator-3/

    Here, the start-event and end-event are held as separate records, with records being appended to the table in date order. This simplifies data entry in a large table, since you don’t have to find the relevant record to add its end date – you just append. Anyway, I went for a VBA ‘cursor’ solution, which I think offers interesting possibilities. See also Sebastien Labonne’s comment, which is similar to your solution.

    I think that there are interesting issues with ‘temporal’ data like this. My wife deals with this sort of thing as a ‘Business Intelligence’ consultant. For example, an employee can hold different posts, successively, in an organization; holiday, sickness, etc, are booked against post, not employee. That’s fine until, temporarily, an employee holds two posts at the same time; do you end up with double-counted sickness, etc, and how do you correct this?

    /Roy

  5. Dick…you can improve your formula more by using named ranges :-)

    =COUNT(active_date)-SUMPRODUCT((active_date>=end)+(inactive_date<=start))
  6. I think this formula should also do the trick:

    =SUMPRODUCT((active_date>=(start-(inactive_date-active_date)))*(inactive_date<end+(inactive_date-active_date)))

    …but then I’ve been staring at this for such a long time that my wife has stopped talking to me. Come to think of it, when exactly did she stop talking to me…could have been years ago, in hindsight.

    Note that the second half has only a less than sign after the inactive_date range, to emulate Dick’s example. Personally I’d include a ‘less than or equal to’ clause here, to flag customer no. 17 (Sto Plains Holdings) given they started on your end date.

  7. I was wrong. If the Active Date is well before the Start date, it’s necessary to look at both start and end dates. However, I think directly counting active accounts is better than subtracting inactive accounts from all accounts. Since Start and End dates are cells, it’s possible to use

    =SUMPRODUCT((ActiveDate<=End)*(InactiveDate>Start))

    This raises the question how typical users would believe the record for customer 17 (in row 18) should be handled. I still hold that most users would expect that record to be included as active because it was active on 7/25 (the End date) and most users would expect the end date to be inclusive. The formula above is for inclusive Start and End dates. If the End date were exclusive, the formula would become

    =SUMPRODUCT((ActiveDate<End)*(InactiveDate>Start))
  8. fzz: Concur that the formula should be inclusive rather than exclusive. The only reason my example was exclusive is because I already took my screen shot before I noticed it. It was easier to explain it than to take another screen shot. The epitome of laziness.

  9. Here’s another approach that I think works, using a very different approach:

    {=SUM(- -(IF(end-active_date>inactive_date-start,end-active_date,inactive_date-start)<(inactive_date-active_date+end-start)))}

    It’s built up from this:

    =MAX(end-active_date, inactive_date-start)>(inactive_date-active_date+end-start)

    . How this works: Say you’ve got a pair of pants, and a shirt. The pants are 80cm long, and the shirt is 60cm long (sorry, too lazy to convert to inches). If you put these on, and the distance from the top of the shirt to the bottom of the pants is less than 140 cm (80cm + 60cm), then your shirt and pants must overlap. If the distance is more than 140cm, then people can see your belly button.

    The bit of the formula MAX(end-active_date, inactive_date-start) acconts for the fact that you may put your pants on your top half and your shirt on your bottom half. Basically, it measures the distance from the highest bit of clothing to the lowest bit.

Leave a Reply

Your email address will not be published. Required fields are marked *