parsing excel documents with python

parsing excel documents with python

Youre best bet for parsing Excel files would be the xlrd library. The site has links and examples for xlrd and related python excel libraries, including a pdf document that has some good examples of using xlrd. Of course, there are also lots of related xlrd questions on StackOverflow that might be of use.

One caveat with the xlrd library is that it will only work with xls (Excel 2003 and earlier versions of excel) file formats and not the more recent xlsx file format. There is a newer library openpyxl for dealing with the xlsx, but I have never used it.

As per Johns comment, the xlrd library now supports both xls and xlsx file formats.

Hope that helps.

The pandas library has a quick and easy way to read excel. If its mostly just data and nothing too complicated itll work:

import pandas as pd
ex_data = pd.read_excel(excel_file.xlsx)

It reads it into a pandas DataFrame, which is handy for data munging, etc.

To go to a list:


If you have multiple tables and things in each worksheet then you may want to use another library such as xlrd or openpyxl.

parsing excel documents with python

openpyxl is a great library and supports read/write to 2010 xlsx files.

sample parsing code

from openpyxl import load_workbook
wb = load_workbook(Book1.xlsx)
ws =
for row in ws.iter_rows():
   for cell in row:
     print cell.value

sample writing code

from openpyxl import Workbook
from openpyxl.utils import get_column_letter

wb = Workbook()

dest_filename = empty_book.xlsx

ws1 =
ws1.title = range names

for row in range(1, 40):
    ws1.append(range(600)) = dest_filename)

you can read more here:

Leave a Reply

Your email address will not be published. Required fields are marked *