The New Excel 2007 File Format

Most of you will know that Excel 2007 (well, Office 2007) comes with a brand new file format, based on what MSFT calls Open XML.

This suddenly enables us to write code that can easily generate/change Office 2007 files without the need for an Office installation. For instance on a server.

Whilst there is proper documentation on this file format, the document with detailed descriptions of the Open XML format (“part 4? in the aforementioned link) counts an astonishing 4721 pages !!

This is why I decided to write up a couple of basic pages on how to do stuff with these Open XML files.

My first one is about reading and editing cells:

Working With Worksheet Data In An Excel 2007 File

Enjoy!

Regards,

Jan Karel Pieterse
www.jkp-ads.com

Posted in Uncategorized

13 thoughts on “The New Excel 2007 File Format

  1. I have given several ‘closer looks’ at xml over the last year or so and am left with the nagging question: ‘is the average joe going to be able to use the product anymore?’

  2. Nice work, Jan Karel. :)

    doco, I don’t really think that the disection of the XML file is really intended for the average user. To them, they’ll just boot up Excel and work as normal, never knowing about what’s really going on under the hood. I’d have to question if that is a change at all. The only thing that they might notice is a decrease in file size.

    For programmers, though, there is some real attractants to this. The best example I’ve heard so far is where a bank could automate the process of stripping confidential information from a file (such as a SIN/SSN number) when a user tries to email it.

    For the more sophisticated users, I see a lot of benefit, but for avearge users, again, I doubt that they’ll even notice.

  3. I would consider myself an average user, and I’m not looking forward to learning how to modify the ribbon. I’ve just developed some skill at VBA and changing the toolbars! I am looking forward to the integration of the Analysis tool-pak and SUMIFS, IFERROR and any other *new* functions (*new* because they can currently be constructed). I’m hoping there is no change to the camera tool.

    Brett

  4. Ken –

    “For the more sophisticated users, I see a lot of benefit, but for avearge users, again, I doubt that they’ll even notice.”

    I agree with this assessment.

    Brett –

    “I’m not looking forward to learning how to modify the ribbon.”

    I wasn’t either, but I had a client request, so I jumped in head first. I had to, since I like to claim to be willing to learn new things.

    There are already a fair number of online resources for this, and it isn’t too difficult. Being open to digging into the XML file structure is a prerequisite. I still can’t make the ribbon work as nicely as I can make command bars work, but part of that is that (IMO) the ribbon doesn’t work as nicely as command bars work.

  5. In my firm all we care about is the speed with which Excel calculates. The move to .xlsm really slowed things down, especially calculating/saving/closing large files. We’ve since switched to the new binary format, .xlsb (an updated version of .xls), and things are great! Improved calc speeds, smaller file sizes vs. .xls and .xlsm, and much faster saving.

    I wonder if we’ll ever regret not using the new open standard format? I highly doubt though.

  6. Jeremy, I don’t understand how the file format can affect calculation speed. And are you really seeing smaller files when you use XLSB?

  7. You’ll regret as soon as you get a task that involves making similar changes to hundreds or thousands of files, e.g. when your company name or logo changes. With the new fileformat things like these are a breeze.

    Also, I don’t know whether the Office 2007 conversion tool for older Office versions supports the xlsb format. If it doesn’t you’ll have to save to a different format first so your recipient can use your xlsb files.

  8. XLB files are smaller and faster, but as Pieterse notes, you lose the ability to take advantage of the open xml.

    Here’s a real-world example. While working with Beta , I created many xlsx files that will not open in the final release of Excel 2007. Microsoft made it such that I would have to reinstall Beta 2, open them there, and save while in Beta 2. What a pain.

    Instead, I simply opened the xlsx packages and change a few namespaces in the beta-created xlsx files. All those file work in Excel 2007 now. Open XML in action! I dare say that this is one of many situations that this new technology will come in handy.

  9. I had a file that was 42MB in 2003 .xls format. It was reduced to something like 23MB in .xlsm (it has macros) format. It was reduced to 15MB in the .xlsb format. (Yes this workbook is too big and the data is already being moved to a database but it makes a good test case). Also, I didn’t test calculation speeds between the formats with a timer but every user in the office noticed an improvement when I made the switch to .xlsb. This perceived improvement could have had more to due with the faster opening and closing speeds of the binary files though.

    I’m sure most of you have already read these articles, but the benefits of .xlsb are identified here:
    http://blogs.msdn.com/excel/archive/2006/07/20/671995.aspx
    And yes, the article does say “once loaded into memory, the file format has no effect on application/calculation speed”.

    And a quote from the ‘Improving Performance in Excel 2007’ MSDN article which Charles Williams wrote.
    “XLSB is the Excel 2007 binary format. It is structured as a compressed folder that contains a large number of binary files. It is much more compact than the XLS format, but the amount of compression very much depends on the contents of the workbook. For example, ten workbooks show a size reduction factor ranging from two to eight with an average reduction factor of four. In Excel 2007, opening and saving performance is only slightly slower than the XLS format.”

    Maybe I’ll find some time to use, http://www.blog.methodsinexcel.co.uk/2007/05/11/mie-cal-tool-beta/, and do a quick calculation speed test on this file in each of the different formats. I’ll post back if I do.

    And yes, Excel 2003 does open .xlsb files when it has the converter installed. .xlsb basically gives the improvements of 2007 without sacrificing the substandard performance issues of the xml file formats.

    If we ever need the .xlsx for MOSS support or something I suppose we can save the needed files in the open standard. Or, as Microsoft continues to work with the xml formats and nails the performance issues we’ll simply switch over at that time.

  10. .Xlsb taxes processor at a factor of 10+, saves on ram and hard drive at a factor of 3 to 4.
    So you have your trade off
    .Xlsb is great unless you aren’t updating streaming data or running macros in real time at high frequency.
    I’ve built a 200mb .xls file. In .xlsb it is about 80mb, but impossible to run with real time data updates.
    I have to switch back to .xls
    .xls takes more hard drive and ram, but significantly less CPU – doesn’t overload processor

  11. My further tests of Excel2003 versus Excel2007 showed that 2007 has extended functionality at a cost of over taxing CPU beyond reason.
    Simply put it is too bloated to use when you need speed.Ribbon is an eye candy at a loss of efficiency and clarity. Charts in 2007 has controls that are more buried and less accessible. Even after SP1 release it’s still slightly buggy and greatly bloated.
    I still mainly use 2003 and 2007 for raw data storage and back/forth format conversion that fixes erred xls files.

    I can work without 2007, but not without 2003 version. Also I don’t use Vista or XP for the same reason – too bloated. I run everything on Windows 2003. That’s the best Windows OS for business so far – best functioning windows to run excel 2003. Softy is yet to come with anything better they made in 2003.

  12. Forgot to tell you I run Excel on Dual core Athlon 6400+ with 4 GB ram quad display. I have an ultimate task power system, still 2003 beats everything 2007 and later from Microsoft when it comes to hard core business performance.
    Microsoft mainly waisted last few years designing eye candies and pretty sounds chasing Mac OS. Wrong move.

  13. I agree with Greg the Thinker “My further tests of Excel2003 versus Excel2007 showed that 2007 has extended functionality at a cost of over taxing CPU beyond reason… Ribbon is an eye candy at a loss of efficiency and clarity”.

    My company switched to Office 2007 (why not 2010???) a few months ago due to communication issues with a software vendor who is doing a big install here. Since then I have spent dozens of hours reworking spreadsheets and yesterday I experienced the most entertaining visit I’ve ever had with my boss as I watched him struggle with the Ribbon Bar to add his thoughts to my annual budget worksheet out on a shared drive.

    The upshot of this is that I now do most of my work in Open Office 3 and import the files into Office 2007 if I need to send them to a collegue.

    On the home front, I bought a cheap quad-core PC, added a second drive, blew away Win 7, and loaded Ubuntu 10.10 and Open Office 3 in a Raid 0 environment. Boots up in about 15 seconds, Open Office Calc works great with no “ribbon”, NO VIRUSES, and Google Chrome Browser is a rocket.

    Big plus – I have a copy of XP-Pro that runs as a virtual machine in case I need to log into work. The last time the virtual XP session crashed – I didn’t waste a day debugging things – I just deleted the session and loaded the last good snapshot – took about a minute. When was the last time you EVER fixed anything in MS-Office or Windows in a minute?

    I love it – now if I could just convince our CIO!


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.