Convert PDF to Excel

Contextures had a nice post last week on converting a PDF to an Excel file using pdftoexcelconverter.net. Talk about a URL that says what it does. Jeff Weir commented that he’s used the OCR capabilities of OneNote to do the same thing. I thought a test was in order.

The original Excel file that was printed to PDF using CutePDF

I followed Debra’s instructions and got this result from pdftoexcelconverter.net

It converted the font to Times New Roman and ignored borders, bold, merged cells, underlines, and italics. For pure data though, a pretty good job.

Next, I dragged-and-dropped the PDF into OneNote and chose the “Print Out” option. I right-clicked on the image and chose “Copy Text…” and pasted in Excel.

Yikes. Font conversion is the least of my worries here. Finally, I dragged-and-dropped the PDF into Google Docs and chose the options to convert to Google Docs format and to OCR it. Google Docs converts PDFs to Documents (not spreadsheets) so I wasn’t very hopeful. I didn’t see a way to convert the Doc to a spreadsheet so I saved it as HTML, then opened in Excel.

The whole table is one cell. My conclusion is that OCRing tables is hard.

8 thoughts on “Convert PDF to Excel

  1. Yes, I think TTC would be OK for that. TTC can make things worse, like if the row headers have spaces in them. Speaking of that, FoxIt Reader has a View – Text Reader menu item. When I use that, copy and paste, and TTC, I get this.

    Not bad.

  2. Hi Dick. My suggestion of using OneNote was really only in the case when someone scans a report and the file becomes a picture. I take it in your test that this was not the case? i.e. items were stored as text, in PDF format?

  3. Query: I tried http://www.pdftoexcelconverter.net/ and was also impressed with the accuracy of the result. But I ended up with a different problem, The PDF was a long, multi-page (over 50pp) document. And the result came back as a multi-worksheet XLS file . . . rather than a single worksheet. I know there are workarounds (“consolidate data” and others, none of which is great) for combining multiple worksheets into a single worksheet. But the simple fix would be to create a single XLS worksheet in the first place. any suggestions? http://www.pdftoexcelconverter.net/ takes me straight to the converer: I don’t see any settings nor do I see any support link.

  4. There is surely no free lunch!! Or has everyone already forgotten that? Any information that you upload to servers is stored on their and at the discretion of the company to use or do as they so please. Please be aware of that prior to uploading your pdf and hoping to convert it excel for !Free!! Especially if you are uploading project estimates, reports, bank statements, etc. I would in information consulting with a Fortune 100 company and hence bringing this gross invasion of privacy to unsuspecting customers. Forewarned is forearmed.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax