开发者

Reading .xlsx format in python

开发者 https://www.devze.com 2023-03-22 10:23 出处:网络
I\'ve开发者_C百科 got to read .xlsx file every 10min in python. What is the most efficient way to do this?

I've开发者_C百科 got to read .xlsx file every 10min in python.

What is the most efficient way to do this?

I've tried using xlrd, but it doesn't read .xlsx - according to documentation he does, but I can't do this - getting Unsupported format, or corrupt file exceptions.

What is the best way to read xlsx?

I need to read comments in cells too.


xlrd hasn't released the version yet to read xlsx. Until then, Eric Gazoni built a package called openpyxl - reads xlsx files, and does limited writing of them.


Use Openpyxl some basic examples:

import openpyxl

# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)

# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)

# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)

# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)

o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)

o_cell = o_sheet['H1']
print(o_cell.value)

# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)


There are multiple ways to read XLSX formatted files using python. Two are illustrated below and require that you install openpyxl at least and if you want to parse into pandas directly you want to install pandas, eg. pip install pandas openpyxl

Option 1: pandas direct

Primary use case: load just the data for further processing.

Using read_excel() function in pandas would be your best choice. Note that pandas should fall back to openpyxl automatically but in the event of format issues its best to specify the engine directly.

df_pd = pd.read_excel("path/file_name.xlsx", engine="openpyxl")

Option 2 - openpyxl direct

Primary use case: getting or editing specific Excel document elements such as comments (requested by OP), formatting properties or formulas.

Using load_workbook() followed by comment extraction using the comment attribute for each cell would be achieved by the following.

from openpyxl import load_workbook
wb = load_workbook(filename = "path/file_name.xlsx")
ws = wb.active
ws["A1"].comment # <- loop through row & columns to extract all comments
0

精彩评论

暂无评论...
验证码 换一张
取 消