Too Many Cells

There have been quite a few comments on this blog, and elsewhere, about the new limits of rows and columns. Mostly, the comments can be paraphrased as “finally” or “if you need that many rows, you should be using Access”. This is an interesting topic about which I know nothing. I never let facts get in the way of a good blog post, though and I won’t be starting now.

First, 17 billion cell is too many. On the other hand, maybe it’s not. I think we can all agree that 256 columns was too few and that increasing that limit was an important improvement. I mean, really, limiting our columns like we’re some kind of relational database or something. But the argument can be made that 1 million rows and the resulting increase in the cell count is inviting abuse. Let me make that argument.

If I were designing a spreadsheet from scratch, I would fix the number of cells and allow flexibility in the rows and columns. I’ve heard that most people either need a ton of rows or a ton of columns, but usually not both. I’m thinking a lower limit on the number of rows and columns and an upper limit on the number of cells. For instance, you can’t have less than 32 columns or less than 1024 rows. The upper limit is defined as the product of rows and columns and can’t exceed 16,777,216. The lower limits on the rows and columns are for some predictability when writing chunks to memory. (I don’t know what I’m talking about, but it sounds good.) I don’t have any issues problems that wouldn’t be solved by that arrangement. If I did, I wouldn’t have the memory/CPU to get it done anyway.

I, and others, have asked why Excel didn’t take this approach. Considering the look I got asking that question, you’d think I asked them to rebuild New Orleans with a hundred feet of bailing wire and a pair of needle nose pliers. Okay, that’s not happening. But I still think it’s right way to go.

To be fair, when I said I didn’t know what I was talking about when I mention writing to memory, I wasn’t kidding. For that reason, I don’t blame these guys for not starting from scratch. I have no concept of the problems associated with interfacing with the operating system, which presumably would manage the memory. Perhaps a change this radical would require an almost total rewrite. I can’t, in good conscience, complain about code assets and then ask MS to abandon their own. So I give them the benefit of the doubt that limiting cell count is not a workable option.

If that’s true, then we’re left with limiting rows and/or columns. Those limits will, at least to some extent, be arbitrary since the needs of Excel users are as varied as the users themselves. Increasing these limits was likely a bit of a rewrite, albeit less than total, and I think that needs to factor into the decision. They could have imposed a row limit of a quarter million (2^18), but that would be just as arbitrary as the 1 million rows (2^20). Do you want MS limiting your rows so that you don’t use Excel as a database? Do you want arbitrary limits in your software in the vendor’s attempt to get you to use the product “properly”? Neither do I. I support Microsoft’s decision to give Excel users enough rope to hang themselves.

I forget if I was for or against this many rows when I started this post, but I seem to be for it now. There’s one important consideration related to this limit, however. Excel is not just a vehicle for creating spreadsheets. I have some spreadsheets that I consider to be fairly complicated. And yet, very few of them use rows and columns that extend beyond one screen (N50 on this machine). To be sure, there are supporting worksheets that extend that far that are mostly used to store external data or fairly static information. But the sheets I look at just don’t grow that big. I think many people have spreadsheet similar to that. If your worksheet’s used range extends to BX52000, and it’s a worksheet you actively use, there’s a lot going on there that’s difficult to see and follow. They exist; they’re just not that common.

On the other hand, we’ve all hit the row limit. It’s because we didn’t, in that case, use Excel as a information gathering tool. Nor did we use it as an information reporting tool. We used it, in fact, as a temporary information processing tool. For instance, someone might have a data dump from another program. A data dump over which they have no control. And somewhere in the middle of this 120,000 line text file, there are three lines that hold all the secrets to life on earth; or at least some useful accounting information. To find the rows, you have to know the first five characters of the row and then you have to subtract the number after the fifth comma from the number after the tenth comma, but only if the last character is a ‘C’.

So you open the text file in Excel, autofilter, sort, write a couple of quick formulas, identify the rows, delete everything else, and finally save it back out as a text file. Now you can import this new text file into your CRM system, provide your boss with the information he needs, get promoted, move into the corner office, and marry your high school sweetheart. Oh, but wait. You can’t open the text file in Excel because it has too many rows. Now you have to marry your cousin and move into a single-wide trailer with a family of raccoons. But at least nobody is using Excel as a database.

Hyperbole aside, Excel is a tool that can be abused or not. Personally, I’d rather have the flexibility and I’ll deal with the Excel-runs-slow-on-my-PIII-with-256k-of-RAM questions.

Posted in Uncategorized

19 thoughts on “Too Many Cells

  1. As I was reading this, I was all set to chime in with a brilliant point. Then I saw that you beat me to it.

    The key phrase is “temporary information processing tool.” That describes how I was using Excel EVERY time I’ve ever hit the row limit.

  2. In my mind, Microsoft’s change is the equivalent of raising the minimum wage to $20 and hour. Would everbody love to have a minimum wage of $20 an hour? Of course! Will it help some people, heck yeah.

    But is it overkill? Is it practical?

    Any software’s limits is based on some percieved value vs practicality. I guess with Excel 12, Microsoft felt it was both valuable and practical to raise the row and column limits to 1 Mil and 16K+. But even that is “arbitrary”. Why stop at 1 mil?

    What happens to the guy who wants to do some temporary processing with 1.1 million rows?

    I know: “But Mike, who will need to process 1.1 million rows? That just ridiculous.” Not like that nice even 1 mil rows.

  3. I believe there is a reason the col/row limit is not unlimited.

    Consider the reference dailydoseofexcel12345
    If there were unlimited cols/rows, it would be considered a cell reference, not a named reference.

    I doubt we’ll ever see another increase in sheet size for that reason alone.

    PS. I’m getting excited for >256 cols… but as has been touched on, I only ever reach the row limit when importing/manipulting large CSV files.

    Rob

  4. The new dimensions of sheets offer more space to store data in. Excel is already used as a replacement for db, xml-files, textfiles and other storage alternatives.

    As usual when new options are available it’s in the end up to the corporates and individuals how they handle the new capacity in Excel 12.

    The only dramatical change IMO is that the average filesize propably will increase.

    For corporates it’s *still* questions about:

    * Security
    Is it acceptable to store large amount of private & confidential data in Excel workbooks in view of how easy it is to distribute them, easy to unprotect them and also that large workbooks get easier corrupted.

    * The effect on the network traffic
    An increasing issue with large file of all kind.

    * The need of information
    A general trend (at least among the larger national corporates I work with) is that the number of reports and the size of them decrease within corporates. The new dimensions will propably challenge this trend.

    The bottom line is that MSFT provide us with new facilities and it’s up to all of us to use them with judgement.

    However, I will not be surpriced to see in the future following type of questions in public forums:

    “I got a 980 MB large Excel-file and it’s running extremly slow on my laptop. How can I improve the performance…”

    Kind regards,
    Dennis

  5. XLDennis is right. Corporate decides on what size computer you have. How much file space you need. They always know what you should have. Not you. It took 3 years of convincing to get the system I have now. In my old system, I would press the button on an Excel file with 20,000 rows. Then go have coffee, a donought, go to the bathroom, and read the paper. Sometimes I may even go home and come back and the system would still be calculating. Even with the new system, I had to get smarter about how I create Excel files. Thanks to all of you and your blogs. I have seen too many people refusing to think outside the box and load up their Excel with more and more data which just keeps increasing the calculating time. Their productivity drops. They can’t transfer files, email is impossible, and current networks get clogged. The current Excel forces them to think of new ways, like the use of Access. It is like putting fertilizer on the lawn. If I put a bag and the lawn looks good, maybe I should but two and the lawn will look great. Except you end up cutting twice as much grass if you don’t kill it first.

    Maybe having all those rows and columns is good but the user needs to be aware of the consequences.

  6. My undertanding is that xlsx will come in two flavours.
    1. xml
    2. xml zipped binary

    zipped would be smaller than what we deal with currently.
    Zip also offers some password protection options, if MS decides to implement it – though I wouldn’t be surprised to see other security advances given the effort they’ve put into announced features.

    Rob

  7. Dennis
    There are already 1 Gb spreadsheets, although I think their owners keep quiet about them.

    Row limit- I’m with Mike, there is still a limit, its just a bit bigger, people will soon bump up against the new one. For temporary stuff, fair enough, but how many spreadsheets in use daily started out as temporary?

    Personally I use DTS (Data Transformation Services (SQL Server)) for cleaning and manipulating data – there is no real limit in records or fields or functionality.

    cheers
    Simon

  8. Rob – interesting but I’m not sure what the effect will be. A large file in the old format will be smaller in the coming format which will either lead to the use of even larger files or?

    Simon – Then we are at least two who like DTS and SQLIS (SQL Integration Service for SQL Server 2005) :)

    Kind regards,
    Dennis

  9. Dennis,

    I’m pretty sure that most users aren’t aware of their filesizes, even now.

    As an example, I see lots of bloated worksheets which contain hardly any data, just a lot of formatting rules extended thousands of rows.
    The 2MB XLS reduces to 50KB. You just saved close to 2MB! but the user doesn’t see the significance.

    It’s only when mail servers return “attachment too big” errors that they pay attention – perhaps by compressing them.

    Some users are better than others, so I may be generalising a little.

    Excel 12 compression should help on diskspace, but it’s cheap as dirt anyway.

    I’d pay more attention to the data crunchers. Those users who are intelligent, deal with a lot of data but have limited Excel know-how.
    While their spreadsheets will work, they’ll reproduce functionality which Excel could already provide with little-known functions…. take the power behind SUMPRODUCT and MATCH/INDEX for example.

  10. A while agon over at excel watche i mention that one day we might just get one app. that is both spreadsheet, database, word processor, etc, etc.

    With the number of rows now very high, excel will be used like a database – even more than it is today! – this will put the demands up.

    I know that MS have been splitting there products out, but i can’t see why, excel and acess could not share more of the same feauture – people are keen to point out that one is a RDMS and one a SS, but if the 2 things are close to each other, why not stick them together – or at least up in some features from each other?

    what about this?

    Msgbox “You seem to have a lot of data in your workbook which may slow down your systems performace. You can use Excels query wizard to help speed things up”

    maybe one day!

  11. Ross –

    With 30k formatted characters in a cell, we’ll see Excel used as a word processor more than ever, too. My sister-in-law told me this weekend that she did her resume in Excel, because she could control the layout better than in Word. I’m thinking, “Merged cells, gaaa!”

    – Jon

  12. “If I were designing a spreadsheet from scratch, I would fix the number of cells and allow flexibility in the rows and columns. I’ve heard that most people either need a ton of rows or a ton of columns, but usually not both. … The upper limit is defined as the product of rows and columns and can’t exceed 16,777,216. “

    In a sense, the product of rows*cells is already limited by the amount of memory you have in your computer.

    Making this explicit would end up being pretty complex: As an example, consider the case of transposing an 100,000×10 array in a spreadsheet with a 1,000,000 cell (rows*cols) limit. Transposing the array requires writing values into cells that “don’t exist” prior to the transpose operation. This opens up the question of when those cells come into being…

    The Excel approach of ‘every cell exists all the time, and memory gets allocated as required’ avoids these semantic hassles.

    -Mike

  13. All I needed was the quick fix of why pressing Tab made me jump a whole page. Some nerve they had expecting me to pay for that information.

    Thanks!

  14. Experts Exchange really ticks me off. Their advice is absolutely no better than what you may find in any no-fee forums or newsgroups. I wish their pages weren’t included in the search results.

  15. In experts exchange… You can just open an “expert” account… Then you are able to read the colutions without paying anything ;-)Hehehe

  16. I think Jon was commenting on the quality of the advice, but sometimes a solution is found there higher in the search results than elsewhere.
    And nobody needs to pay … just scroll down the page past the listings of topics and the entire thread is at the end of the page.

  17. MY PROBLEM is . . .
    I have a 12 MB spreadsheet that runs slow on my laptop (and nothing runs slow on this beast).

    When I save it in .htm format it is only 70KB!

    The problem – Although I only have 400 rows and 10 columns, I have 250 tabs! I know this is a lame way to use excel, but what other options are out there? Do you see my problem – 250 sheets, and 99% of the cells are empty! Why can’t I remove those extra rows and columns? ? ?

    Yes – I know, I would put all the data in one sheet, but I need the seperation.

Leave a Reply

Your email address will not be published. Required fields are marked *