开始码代码之前,我们先来了解一下三种邮件服务协议:
1、SMTP协议
SMTP(Simple Mail Transfer Protocol),即简单邮件传输协议。相当于中转站,将邮件发送到客户端。
2、POP3协议
POP3(Post Office Protocol 3),即邮局协议的第3个版本,是电子邮件的第一个离线协议标准。该协议把邮件下载到本地计算机,不与服务器同步,缺点是更易丢失邮件或多次下载相同的邮件。
3、IMAP协议
IMAP(Internet Mail Access Protocol),即交互式邮件存取协议。该协议连接远程邮箱直接操作,与服务器内容同步。
然后介绍一下email包
这个包的中心组件是代表电子邮件消息的“对象模型”。 应用程序主要通过在 message 子模块中定义的对象模型接口与这个包进行交互。 应用程序可以使用此 API 来询问有关现有电子邮件的问题、构造新的电子邮件,或者添加或移除自身也使用相同对象模型接口的电子邮件子组件。 也就是说,遵循电子邮件消息及其 MIME 子组件的性质,电子邮件对象模型是所有提供 EmailMessage API 的对象所构成的树状结构。
接下来我们通过具体的代码实现一个登录邮箱客户端,下载邮件,解析邮件附件内容的功能。
首先我们需要定义一个邮件解析的类,该类需要三个变量:
1、邮箱所属的imap服务地址;
2、邮箱账号;
3、邮箱密码【注:不同邮箱需要不同的安全策略,例如qq邮箱需要短信验证,获取登录授权码,而不是明文密码去登录远程客户端】
1
2
3
4
5
6
7
8
9
|
class Email_parse: def __init__( self ,remote_server_url,email_url,password): # imap服务地址 self .remote_server_url = remote_server_url # 邮箱账号 self .email_url = email_url # 邮箱密码 self .password = password |
然后定义类中入口函数,登录远程,默认获取第一页所有的邮件。我们获取邮件的主题,并打印出来【不同邮件主题的编码可能不同,二进制需要转码才能正确显示】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
def main_parse_Email( self ): """入口函数,登录imap服务""" server = imaplib.IMAP4_SSL( self .remote_server_url, 993 ) server.login( self .email_url, self .password) server.select( 'INBOX' ) status,data = server.search( None , "ALL" ) if status ! = 'OK' : raise Exception( 'read email error' ) emailids = data[ 0 ].split() mail_counts = len (emailids) print ( "count:" ,mail_counts) # 邮件的遍历是按时间从后往前,这里我们选择最新的一封邮件 for i in range (mail_counts - 1 , mail_counts - 2 , - 1 ): status, edata = server.fetch(emailids[i], '(RFC822)' ) msg = email.message_from_bytes(edata[ 0 ][ 1 ]) #获取邮件主题title subject = email.header.decode_header(msg.get( 'subject' )) if type (subject[ - 1 ][ 0 ]) = = bytes: title = subject[ - 1 ][ 0 ].decode( str (subject[ - 1 ][ 1 ])) elif type (subject[ - 1 ][ 0 ]) = = str : title = subject[ - 1 ][ 0 ] print ( "title:" , title) |
其中,msg变量保存的就是邮件的主体,接下来因为会重复用到msg和tilte,我们将构造一个类函数返回msg和title。
1
2
3
4
5
6
7
8
|
def get_email_title(msg): subject = email.header.decode_header(msg.get( 'subject' )) if type (subject[ - 1 ][ 0 ]) = = bytes: title = subject[ - 1 ][ 0 ].decode( str (subject[ - 1 ][ 1 ])) elif type (subject[ - 1 ][ 0 ]) = = str : title = subject[ - 1 ][ 0 ] print ( "title:" , title) return title |
解析邮件,我们分为两部分,邮件正文【HTML】和附件【xlsx等】,判断有附件,我们就保存到固定的路径下。表格的解析不再赘述了,pandas之类的包足以搞定。
1
2
3
4
5
6
7
8
9
10
11
12
13
|
def get_att(msg): """获取附件并下载""" filename = Email_parse.get_email_name(msg) for part in msg.walk(): file_name = part.get_param( "name" ) if file_name: data = part.get_payload(decode = True ) if data ! = None : att_file = open ( './src/' + filename, 'wb' ) att_file.write(data) att_file.close() else : pass |
邮件正文内容,我们直接解析html,将文本内容直接保存到.txt文件中,方便读取。
1
2
3
4
5
6
7
8
9
10
11
12
13
|
def get_text_from_HTML(msg): """获取邮件中的html""" filename = Email_parse.get_email_name(msg) current_title = Email_parse.get_email_title(msg) print ( "filename:" ,filename, type (filename)) for part in msg.walk(): if not part.is_multipart(): result = part.get_payload(decode = True ) result = result.decode( 'gbk' ) f = open (f './src/{current_title}.txt' , 'w' ) f.write(result) f.close() return result |
完整代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
|
import email import imaplib from email.header import decode_header import pandas as pd import datetime class Email_parse: def __init__( self ,remote_server_url,email_url,password): self .remote_server_url = remote_server_url self .email_url = email_url self .password = password def get_att(msg): filename = Email_parse.get_email_name(msg) for part in msg.walk(): file_name = part.get_param( "name" ) if file_name: data = part.get_payload(decode = True ) if data ! = None : att_file = open ( './src/' + filename, 'wb' ) att_file.write(data) att_file.close() else : pass def get_email_title(msg): subject = email.header.decode_header(msg.get( 'subject' )) if type (subject[ - 1 ][ 0 ]) = = bytes: title = subject[ - 1 ][ 0 ].decode( str (subject[ - 1 ][ 1 ])) elif type (subject[ - 1 ][ 0 ]) = = str : title = subject[ - 1 ][ 0 ] print ( "title:" , title) return title def get_email_name(msg): for part in msg.walk(): file_name = part.get_param( "name" ) if file_name: h = email.header.Header(file_name) dh = email.header.decode_header(h) filename = dh[ 0 ][ 0 ] if dh[ 0 ][ 1 ]: value, charset = decode_header( str (filename, dh[ 0 ][ 1 ]))[ 0 ] if charset: filename = value.decode(charset) print ( "附件名称:" , filename) return filename def main_parse_Email( self ): server = imaplib.IMAP4_SSL( self .remote_server_url, 993 ) server.login( self .email_url, self .password) server.select( 'INBOX' ) status,data = server.search( None , "ALL" ) if status ! = 'OK' : raise Exception( 'read email error' ) emailids = data[ 0 ].split() mail_counts = len (emailids) print ( "count:" ,mail_counts) for i in range (mail_counts - 1 , mail_counts - 2 , - 1 ): status, edata = server.fetch(emailids[i], '(RFC822)' ) msg = email.message_from_bytes(edata[ 0 ][ 1 ]) subject = email.header.decode_header(msg.get( 'subject' )) if type (subject[ - 1 ][ 0 ]) = = bytes: title = subject[ - 1 ][ 0 ].decode( str (subject[ - 1 ][ 1 ])) elif type (subject[ - 1 ][ 0 ]) = = str : title = subject[ - 1 ][ 0 ] print ( "title:" , title) Email_parse.get_att(msg) Email_parse.get_text_from_HTML(msg) def get_text_from_HTML(msg): filename = Email_parse.get_email_name(msg) current_title = Email_parse.get_email_title(msg) print ( "filename:" ,filename, type (filename)) for part in msg.walk(): if not part.is_multipart(): result = part.get_payload(decode = True ) result = result.decode( 'gbk' ) f = open (f './src/{current_title}.txt' , 'w' ) f.write(result) f.close() return result if __name__ = = "__main__" : remote_server_url = 'imap.qq.com' email_url = "*********@qq.com" password = "**********" demo = Email_parse(remote_server_url,email_url,password) demo.main_parse_Email() |
运行结果:
以上就是Python实现邮件自动下载的示例详解的详细内容,更多关于Python邮件自动下载的资料请关注服务器之家其它相关文章!
原文链接:https://blog.csdn.net/weixin_44784088/article/details/125662206