Convert PDF to Excel

Contextures had a nice post last week on converting a PDF to an Excel file using Talk about a URL that says what it does. Jeff Weir commented that he’s used the OCR capabilities of OneNote to do the same thing. I thought a test was in order.

The original Excel file that was printed to PDF using CutePDF

I followed Debra’s instructions and got this result from

It converted the font to Times New Roman and ignored borders, bold, merged cells, underlines, and italics. For pure data though, a pretty good job.

Next, I dragged-and-dropped the PDF into OneNote and chose the “Print Out” option. I right-clicked on the image and chose “Copy Text…” and pasted in Excel.

Yikes. Font conversion is the least of my worries here. Finally, I dragged-and-dropped the PDF into Google Docs and chose the options to convert to Google Docs format and to OCR it. Google Docs converts PDFs to Documents (not spreadsheets) so I wasn’t very hopeful. I didn’t see a way to convert the Doc to a spreadsheet so I saved it as HTML, then opened in Excel.

The whole table is one cell. My conclusion is that OCRing tables is hard.

Excel Power Analyst Bootcamp Omaha

Excel Power Analyst Bootcamp Omaha

Microsoft MVPs Dick Kusleika (Daily Dose of Excel) and Mike Alexander (DataPig) are joining together to bring you our acclaimed Power Analyst Boot Camp!

This two-day boot camp is designed for Excel Power Analysts who are looking to more effectively build and manage better data reporting mechanisms. During this workshop, you’ll be introduced to a wide array of tips and techniques that will muscle up your skills in Data Crunching, Reporting, and Automation.

Register early to get a $150 per seat discount. Only $700 for two days of awesome training.

Also, if you didn’t know, Omaha is my home town. That doesn’t just mean that I’ll be more rested during the training, it also means the class will fill up fast as I pressure my colleagues, friends, and family to attend. Don’t wait to sign up. Register here.