常用文件处理函数¶

一、open函数¶

1. 打开文件¶

f = open(filename, mode)

filename: 路径可以是觉得路径或相对路径

mode	说明
r	以输入模式打开（默认值）
w	以输出模式生成并打开文件
a	在文件尾部追加并打开文件
b	二进制数据处理
+	同时支持输入输出

2. 读取文件¶

方法	描述
f.read()	把整个文件读进一个字符串
f.read(N)	读取接下来的N个字符到一个字符串
f.readline()	读取下一行到一个字符串
f.readlines()	读取整个文件到一个字符串列表

>>> f = open('hello.txt')
>>> f.read()
'hello world!\nlife is short,\nI love Python.\n'
>>> f = open('hello.txt')
>>> help(f)
>>> f.read(3)
'hel'
>>> f.readline()
'lo world!\n'
>>> f.readlines()
['life is short,\n', 'I love Python.\n']

# 文件迭代器读取文件
>>> f = open('hello.txt')
>>> for line in f.readlines():
...     print(line)
... 
I love China.

hello world!

3. 写入文件¶

方法	描述
f.write(String)	把字符串写入文件
f.writelines(List)	把列表内的所有字符串写入文件

# 写入文件
>>> f = open('hello.txt', 'w')
>>> L = ['I love China.']
>>> f.writelines(L)
# 读取文件
>>> f = open('hello.txt')
>>> f.read()
'I love China.'

4. 关闭文件¶

f.close()

三、xml¶

XML 指可扩展标记语言（EXtensible Markup Language），XML 被设计用来传输和存储数据。

<bookstore>
<book category="COOKING">
  <title lang="en">Everyday Italian</title> 
  <author>Giada De Laurentiis</author> 
  <year>2005</year> 
  <price>30.00</price> 
</book>
<book category="WEB">
  <title lang="en">Learning XML</title> 
  <author>Erik T. Ray</author> 
  <year>2003</year> 
  <price>39.95</price> 
</book>
</bookstore>

1. xml基础¶

1. 元素¶

根元素是 <bookstore>

<book> 元素有 4 个子元素：<title>、< author>、<year>、<price>。

元素的命名规则

名称可以含字母、数字以及其他的字符
名称不能以数字或者标点符号开始
名称不能以字符 “xml”（或者 XML、Xml）开始
名称不能包含空格

2. 属性¶

lang="en"叫属性，XML 的属性值须加引号

3. 注释¶

1	`<!-- This is a comment -->`

4. 实体引用¶

实体引用	含义
`<`	<
`>`	>
`&`	&
`'`	'
`"`	"

2. The ElementTree XML API¶

python有三种方法解析XML：SAX，DOM，以及ElementTree。ElementTree就像一个轻量级的DOM，具有方便友好的API。代码可用性好，速度快，消耗内存少。

country_data.xml文件内容如下

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

1. 获取内容¶

1. 读取xml¶

方法一：从文件中读取

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

方法二：从字符串读取

import xml.etree.ElementTree as ET
country_data_as_string='文件内容'
root = ET.fromstring(country_data_as_string)

2. 获取元素的标签名称 `.tag`¶

>>> root.tag 
'data'

3. 获取元素的属性 `.attrib`¶

>>> root.attrib
{}

4.获取元素的内容 `.text`¶

>>> root[0][1].text
'2008'

5.获取属性的值`.get`¶

>>> root[0].get("name") 
'Liechtenstein'

6. 遍历子元素¶

>>> for child in root:
...     print(child)
... 
<Element 'country' at 0x0000012451A447C8>
<Element 'country' at 0x0000012451D31408>
<Element 'country' at 0x0000012451D31598>
>>> for child in root:
...     print("子元素标签：",child.tag,"属性：",child.attrib)
... 
子元素标签： country 属性： {'name': 'Liechtenstein'}
子元素标签： country 属性： {'name': 'Singapore'}
子元素标签： country 属性： {'name': 'Panama'}

7. 遍历指定元素`.iter()`¶

>>> for neighbor in root.iter("neighbor"):
...     print(neighbor.attrib) 
... 
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}

8.查找元素`findall`和`find`¶

# findall 查找指定标签的元素
>>> for child in root.findall("country"):
...     print(child)
... 
<Element 'country' at 0x0000012451A447C8>
<Element 'country' at 0x0000012451D31408>
<Element 'country' at 0x0000012451D31598>
# find 查找指定标签下的的子元素
>>> for child in root.find("country"):
...     print(child)
... 
<Element 'rank' at 0x0000012451B42728>
<Element 'year' at 0x0000012451D17B38>
<Element 'gdppc' at 0x0000012451D17B88>
<Element 'neighbor' at 0x0000012451D2C278>
<Element 'neighbor' at 0x0000012451D313B8>

2. 修改内容¶

1. 修改`.set`¶

>>> root[0][1].text 
'2008'
# 重新副职
>>> root[0][1].text = '2018'      
>>> root[0][1].set('updated', 'yes')       
KeyboardInterrupt
>>> tree.write('output.xml')

2. 删除`.remove`¶

>>> for child in root.findall("country"):
...     if child.attrib['name'] == 'Liechtenstein':
...             root.remove(child) 
... 
>>> tree.write('output.xml') 

3.通过XPath定位元素¶

语法	含义
`tag`	选择具有给定标签的所有子元素
`*`	选择所有子元素
`.`	选择当前元素。在xpath表达式开头使用，表示相对路径。
`//`	选择当前元素下所有级别的所有子元素
`..`	选择父元素
`[@attrib]`	选择具有给定属性的所有元素,在同级下选取，并且不向子代查找
`[@attrib='value']`	选择指定属性attrib具有指定值value的元素
`[tag]`	选择所有具有名为tag的子元素的元素
`[tag='text']`	选择元素（或其子元素）名为tag，完整文本内容为指定的值text的元素。
`[position]`	选择位于给定位置的所有元素。该位置可以是整数（1是第一个位置），表达式last()用于最后一个位置，也可以是相对于最后一个位置的位置（例如last()-1）

>>> root.findall('./country/year') 
[<Element 'year' at 0x0000012451D17B38>, <Element 'year' at 0x0000012451D31C28>, <Element 'year' at 0x0000012451D31DB8>]
# 在所有子元素中选择
>>> for year in root.findall('*/year'):
...     print(year.text) 
... 
2008
2011
2011
# .. 表示选择父元素（上一级）
>>> root.findall('./country/..')[0].tag
'data'
# 选择具有给定属性的所有元素
>>> root.findall('./country/[@name]') 
[<Element 'country' at 0x0000012451D2C278>, <Element 'country' at 0x0000012451D317C8>, <Element 'country' at 0x0000012451D31D18>]
# 选择指定属性attrib具有指定值value的元素
>>> root.findall('./country/[@name="Panama"]')
[<Element 'country' at 0x0000012451D31D18>]
# 选择所有具有名为tag的子元素的元素
>>> root.findall('./country/[year]')  
[<Element 'country' at 0x0000012451D2C278>, <Element 'country' at 0x0000012451D317C8>, <Element 'country' at 0x0000012451D31D18>]
# 选择元素（或其子元素）名为tag，完整文本内容为指定的值text的元素。
>>> root.findall("./country/[year='2008']")  
[<Element 'country' at 0x0000012451D2C278>]

四、configparser模块¶

配置文件解析器

1. 创建配置文件¶

import configparser

config = configparser.ConfigParser()

config["DEFAULT"] = {'user': 'ubuntu',
                     'port': '22'}

config['web'] = {}
config['web']['manage'] = 'manageweb'

config['api'] = {}
api = config['api']
api['manage'] = 'manage.war'

with open('test.conf', 'w') as configfile:
    config.write(configfile)

创建好的配置文件

$ cat test.conf 
[DEFAULT]
user = ubuntu
port = 22

[web]
manage = manageweb

[api]
manage = manage.war

2. 读取配置文件¶

import configparser

config = configparser.ConfigParser()
# 读取配置文件
config.read('test.conf')
# 查看配置文件中的章节，不包含DEFAULT
print(config.sections())
# 查看某一个配置的值
print(config['api']['manage'])
# 判断章节是否在配置文件中存在
print('db' in config)
# 用get读取值
print(config['DEFAULT'].get('port'))
# 或
print(config.get('api', 'manage'))
# 如果没有get到，返回一个指定的值
print(config.get('test', 'test', fallback='not find'))

执行结果

$ python3 test.py 
['web', 'api']
manage.war
False
22
manage.war
not find

3. 修改配置文件¶

import configparser

import configparser

config = configparser.ConfigParser()
# 读取配置文件
config.read('test.conf')
# 添加一个章节
config.add_section('mysql')
# 添加一个键值对
config.set('mysql', 'host', '127.0.0.1')
# 保存到文件中
config.write(open('test.conf', 'w'))

五、cvs¶

import csv
# 写入
# 为了保证WindowLinux和行尾的符号一致，最好的加上newline=''
with open('hosts.csv', 'w', newline='') as hostsfile:
    # 定义字符直接的分割符为空格（delimiter=' '）
    # writer = csv.writer(hostsfile, delimiter=' ')
    writer = csv.writer(hostsfile,)
    # 写入一行
    writer.writerow(['IP', 'username', 'password'])
    writer.writerow(['192.168.1.10', 'root', '123456'])
    # 写入多行，传入参数为一个嵌套元组，每一行都存在在一个元组中
    writer.writerows([['192.168.1.11', 'admin', '123456'], ['192.168.1.12', 'admin', '123456']])
# 读
with open('hosts.csv', newline='') as hostsfile:
    reader = csv.reader(hostsfile, delimiter=' ')
    for row in reader:
        print(row)

import csv

# 使用类方法操作
# 写入
with open('hosts.csv', 'w', newline='') as hostsfile:
    fieldnames = ['IP', 'username', 'password', 'port']
    # 设置“标题”
    writer = csv.DictWriter(hostsfile, fieldnames=fieldnames)
    # 写入标题
    writer.writeheader()
    # 参数格式必须是一个字典,
    # 写入单行
    writer.writerow({'IP': '192.168.1.10', 'username': 'root', 'password': '123456', 'port': '22'})

    # 写入多行,传入的参数必须是一个可迭代对象，里面的元素是字典格式的，这里用元组
    writer.writerows([{'IP': '192.168.1.11', 'username': 'root', 'password': '123456', 'port': '22'},
                     {'IP': '192.168.1.12', 'username': 'root', 'password': '123456', 'port': '22'}])




# 读取
#
with open('hosts.csv', newline='') as hostsfile:
     reader = csv.DictReader(hostsfile)
     for row in reader:
         # 返回结果为字典格式
         print(row)