python处理Excel文件--openpyxl

发布时间 2023-06-13 17:34:53作者: 工作手记

openpyxl的介绍

读取/写如 Excel 2010 xlsx/xlsm 文件的 Python 库

写入和读取的简单例子

写入数据到excel
image

excel打开截图
image

从excel读取数据
image
image

openpyxl一些具体的用法

新建工作簿workbook

from openpyxl import Workbook
wb = Workbook()

工作簿我把它理解为一个Excel文件,一个工作簿至少有一个工作表,就像我们新建一个Excel文件打开后会默认创建一个sheet(左下角会显示sheet的名),所以工作表就相当于一个个sheet,我们平时操作Excel单元格数据就是基础于一个个表(sheet)

一个工作簿至少有一个工作表. 你可以通过 Workbook.active 来获取这个默认的工作表
ws = wb.active
也可以创建新的工作表worksheet
ws1 = wb.create_sheet("test")
新创建的默认插到最后面

ws2 = wb.create_sheet("Mysheet", 0)
指定参数0,新创建的会排到第一个
指定参数-1,新创建的会排到倒数第二个
修改工作表的名字
ws2.title = 'test1'
ws1.title = 'test2'
ws.title = 'test3'

获取所有worksheet即工作表的名字
In [9]: wb.sheetnames
Out[9]: ['test1', 'test3', 'test2']

通过worksheet的名字获得一个已存在的worksheet
In [11]: wb['test3']
Out[11]: <Worksheet "test3">

遍历工作簿,得到所有已存在的worksheet
In [12]: for sheet in wb:
    ...:     print(sheet, sheet.title)
    ...:
<Worksheet "test1"> test1
<Worksheet "test3"> test3
<Worksheet "test2"> test2

访问单元格

假设有excel文件,其中第一张表内容如下
image

得到一个worksheet后,可以通过单元格坐标来获得、修改单元格的内容

In [13]: from openpyxl import load_workbook

In [14]: wb2 = load_workbook('demo_table.xlsx')

In [15]: sheet1 = wb2['table']

In [16]: sheet1
Out[16]: <Worksheet "table">

In [16]: sheet1
Out[16]: <Worksheet "table">

通过单元格坐标,得到一个单元格对象,获得该单元格对象的内容
In [17]: sheet1['A1']
Out[17]: <Cell 'table'.A1>

In [18]: sheet1['A1'].value
Out[18]: '列1'

给单元格定义值,单元格若不存在将会被直接创建并定义值
In [19]: sheet1['A1'] = 'column1'

Worksheet.cell()的方式来访问单元格,单元格若不存在将创建并赋值
In [20]: sheet.cell(row=1, column=2, value='column2')
Out[20]: <Cell 'test2'.B1>

访问大量的单元格

ABCD代表列 1234代表行

In [24]: cell_range = sheet1['A1':'E1']

In [25]: cell_range
Out[25]:
((<Cell 'table'.A1>,
  <Cell 'table'.B1>,
  <Cell 'table'.C1>,
  <Cell 'table'.D1>,
  <Cell 'table'.E1>),)
  
 In [28]: for row in cell_range:
    ...:     for cell in row:
    ...:         print(cell.value)
    ...:
column1
列2
列3
列4
列5
In [31]: sheet1[1]
Out[31]:
(<Cell 'table'.A1>,
<Cell 'table'.B1>,
<Cell 'table'.C1>,
<Cell 'table'.D1>,
<Cell 'table'.E1>)

In [32]: sheet1[1:3]
Out[32]:
((<Cell 'table'.A1>,
 <Cell 'table'.B1>,
 <Cell 'table'.C1>,
 <Cell 'table'.D1>,
 <Cell 'table'.E1>),
(<Cell 'table'.A2>,
 <Cell 'table'.B2>,
 <Cell 'table'.C2>,
 <Cell 'table'.D2>,
 <Cell 'table'.E2>),
(<Cell 'table'.A3>,
 <Cell 'table'.B3>,
 <Cell 'table'.C3>,
 <Cell 'table'.D3>,
 <Cell 'table'.E3>))

In [33]: sheet1["A"]
Out[33]:
(<Cell 'table'.A1>,
<Cell 'table'.A2>,
<Cell 'table'.A3>,
<Cell 'table'.A4>,
<Cell 'table'.A5>,
<Cell 'table'.A6>)

In [34]: sheet1["A": "B"]
Out[34]:
((<Cell 'table'.A1>,
 <Cell 'table'.A2>,
 <Cell 'table'.A3>,
 <Cell 'table'.A4>,
 <Cell 'table'.A5>,
 <Cell 'table'.A6>),
(<Cell 'table'.B1>,
 <Cell 'table'.B2>,
 <Cell 'table'.B3>,
 <Cell 'table'.B4>,
 <Cell 'table'.B5>,
 <Cell 'table'.B6>))

In [36]: for row in sheet1.iter_rows(min_row=1, max_row=2, min_col=1, max_col=2):
   ...:     for cell in row:
   ...:         print(cell.value)
   ...:
column1
列2
a2
b2

In [38]: for col in sheet1.iter_cols(min_row=1, max_row=2, min_col=1, max_col=2):
   ...:     for cell in col:
   ...:         print(cell.value)
   ...:
column1
a2
列2
b2

如上,对于openpyxl来说,第一行是从1开始的,第一列是从1开始的

按row的顺序获取表中所有的单元格
In [41]: tuple(sheet1.rows)
Out[41]:
((<Cell 'table'.A1>,
 <Cell 'table'.B1>,
 <Cell 'table'.C1>,
 <Cell 'table'.D1>,
 <Cell 'table'.E1>),
(<Cell 'table'.A2>,
 <Cell 'table'.B2>,
 <Cell 'table'.C2>,
 <Cell 'table'.D2>,
 <Cell 'table'.E2>),
(<Cell 'table'.A3>,
 <Cell 'table'.B3>,
 <Cell 'table'.C3>,
 <Cell 'table'.D3>,
 <Cell 'table'.E3>),
(<Cell 'table'.A4>,
 <Cell 'table'.B4>,
 <Cell 'table'.C4>,
 <Cell 'table'.D4>,
 <Cell 'table'.E4>),
(<Cell 'table'.A5>,
 <Cell 'table'.B5>,
 <Cell 'table'.C5>,
 <Cell 'table'.D5>,
 <Cell 'table'.E5>),
(<Cell 'table'.A6>,
 <Cell 'table'.B6>,
 <Cell 'table'.C6>,
 <Cell 'table'.D6>,
 <Cell 'table'.E6>))

按照col的顺序迭代所有的单元格
In [43]: tuple(sheet1.columns)
Out[43]:
((<Cell 'table'.A1>,
 <Cell 'table'.A2>,
 <Cell 'table'.A3>,
 <Cell 'table'.A4>,
 <Cell 'table'.A5>,
 <Cell 'table'.A6>),
(<Cell 'table'.B1>,
 <Cell 'table'.B2>,
 <Cell 'table'.B3>,
 <Cell 'table'.B4>,
 <Cell 'table'.B5>,
 <Cell 'table'.B6>),
(<Cell 'table'.C1>,
 <Cell 'table'.C2>,
 <Cell 'table'.C3>,
 <Cell 'table'.C4>,
 <Cell 'table'.C5>,
 <Cell 'table'.C6>),
(<Cell 'table'.D1>,
 <Cell 'table'.D2>,
 <Cell 'table'.D3>,
 <Cell 'table'.D4>,
 <Cell 'table'.D5>,
 <Cell 'table'.D6>),
(<Cell 'table'.E1>,
 <Cell 'table'.E2>,
 <Cell 'table'.E3>,
 <Cell 'table'.E4>,
 <Cell 'table'.E5>,
 <Cell 'table'.E6>))
获取表中所有的值
In [43]: tuple(sheet1.columns)
Out[43]:
((<Cell 'table'.A1>,
 <Cell 'table'.A2>,
 <Cell 'table'.A3>,
 <Cell 'table'.A4>,
 <Cell 'table'.A5>,
 <Cell 'table'.A6>),
(<Cell 'table'.B1>,
 <Cell 'table'.B2>,
 <Cell 'table'.B3>,
 <Cell 'table'.B4>,
 <Cell 'table'.B5>,
 <Cell 'table'.B6>),
(<Cell 'table'.C1>,
 <Cell 'table'.C2>,
 <Cell 'table'.C3>,
 <Cell 'table'.C4>,
 <Cell 'table'.C5>,
 <Cell 'table'.C6>),
(<Cell 'table'.D1>,
 <Cell 'table'.D2>,
 <Cell 'table'.D3>,
 <Cell 'table'.D4>,
 <Cell 'table'.D5>,
 <Cell 'table'.D6>),
(<Cell 'table'.E1>,
 <Cell 'table'.E2>,
 <Cell 'table'.E3>,
 <Cell 'table'.E4>,
 <Cell 'table'.E5>,
 <Cell 'table'.E6>))


In [45]: for row in sheet1.values:
   ...:     print(row)
   ...:
('column1', '列2', '列3', '列4', '列5')
('a2', 'b2', 'c2', 'd2', 'e2')
('a3', 'b3', 'c3', 'd3', 'e3')
('a4', 'b4', 'c4', 'd4', 'e4')
('a5', 'b5', 'c5', 'd5', 'e5')
('a6', 'b6', 'c6', 'd6', 'e6')

输出是每行的值一个tupel,上面也可以在嵌套一个for循环,每次输出一个单元格的值

iter_rows() 和 iter_cols()添加values_only参数直接输出值,如
>>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):
...   print(row)

(None, None, None)
(None, None, None)

保存数据到文件

在新建workbook、worksheet、修改单元格等数据后,需要保存数据到磁盘

wb = Workbook()
wb.save('xxxxxx.xlsx')

保存数据到stream

例如当使用 Pyramid, Flask 或 Django 等 web 应用程序时

>>> from tempfile import NamedTemporaryFile
>>> from openpyxl import Workbook
>>> wb = Workbook()
>>> with NamedTemporaryFile() as tmp:
        wb.save(tmp.name)
        tmp.seek(0)
        stream = tmp.read()

从文件中读取数据

from openpyxl import load_workbook

wb = load_workbook('xxxx.xlsx')