Pandas.DataFrame时间序列数据处理的实现_Python

将pandas.DataFrame，pandas.Series的索引设置为datetime64 [ns]类型时，将其视为DatetimeIndex，并且可以使用各种处理时间序列数据的函数。

可以按年或月指定行，并按切片指定提取周期，这在处理包含日期和时间信息（例如日期和时间）的数据时非常方便。

在此，将对以下内容进行描述。

如何将一列现有数据指定为DatetimeIndex
读取CSV时如何指定DatetimeIndex
关于pandas.Series

如何将一列现有数据指定为DatetimeIndex

将pandas.DataFrame与默认的基于0的索引和一个字符串列作为日期。

				?

									import pandas as pd

									df = pd.read_csv('./data/26/sample_date.csv')

									print(df)

									#           date  val_1  val_2

									# 0   2017-11-01     65     76

									# 1   2017-11-07     26     66

									# 2   2017-11-18     47     47

									# 3   2017-11-27     20     38

									# 4   2017-12-05     65     85

									# 5   2017-12-12      4     29

									# 6   2017-12-22     31     54

									# 7   2017-12-29     21      8

									# 8   2018-01-03     98     76

									# 9   2018-01-08     48     64

									# 10  2018-01-19     18     48

									# 11  2018-01-23     86     70

									print(type(df.index))

									# <class 'pandas.core.indexes.range.RangeIndex'>

									print(df['date'].dtype)

									# object

将to_datetime（）应用于日期字符串列，并转换为datetime64 [ns]类型。

				?

									df['date'] = pd.to_datetime(df['date'])

									print(df['date'].dtype)

									# datetime64[ns]

使用set_index（）方法将datetime64 [ns]类型的列指定为索引。

Pandas.DataFrame,重置列的行名(set_index）

索引现在是DatetimeIndex。索引的每个元素都是时间戳类型。

				?

									df.set_index('date', inplace=True)

									print(df)

									#             val_1  val_2

									# date                    

									# 2017-11-01     65     76

									# 2017-11-07     26     66

									# 2017-11-18     47     47

									# 2017-11-27     20     38

									# 2017-12-05     65     85

									# 2017-12-12      4     29

									# 2017-12-22     31     54

									# 2017-12-29     21      8

									# 2018-01-03     98     76

									# 2018-01-08     48     64

									# 2018-01-19     18     48

									# 2018-01-23     86     70

									print(type(df.index))

									# <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

									print(df.index[0])

									print(type(df.index[0]))

									# 2017-11-01 00:00:00

									# <class 'pandas._libs.tslib.Timestamp'>

可以按年或月指定行，并按切片提取周期。

				?

									print(df['2018'])

									#             val_1  val_2

									# date                    

									# 2018-01-03     98     76

									# 2018-01-08     48     64

									# 2018-01-19     18     48

									# 2018-01-23     86     70

									print(df['2017-11'])

									#             val_1  val_2

									# date                    

									# 2017-11-01     65     76

									# 2017-11-07     26     66

									# 2017-11-18     47     47

									# 2017-11-27     20     38

									print(df['2017-12-15':'2018-01-15'])

									#             val_1  val_2

									# date                    

									# 2017-12-22     31     54

									# 2017-12-29     21      8

									# 2018-01-03     98     76

									# 2018-01-08     48     64

还可以指定各种格式的行。

				?

									print(df.loc['01/19/2018', 'val_1'])

									# 18

									print(df.loc['20180103', 'val_2'])

									# 76

读取CSV时如何指定DatetimeIndex

如果原始数据是CSV文件，则在使用read_csv（）进行读取时可以指定DatetimeIndex。

在参数index_col中指定要用作索引的日期和时间数据的列名（或从0开始的列号），并将parse_dates设置为True。

				?

									df = pd.read_csv('./data/26/sample_date.csv', index_col='date', parse_dates=True)

									print(df)

									#             val_1  val_2

									# date

									# 2017-11-01     65     76

									# 2017-11-07     26     66

									# 2017-11-18     47     47

									# 2017-11-27     20     38

									# 2017-12-05     65     85

									# 2017-12-12      4     29

									# 2017-12-22     31     54

									# 2017-12-29     21      8

									# 2018-01-03     98     76

									# 2018-01-08     48     64

									# 2018-01-19     18     48

									# 2018-01-23     86     70

									print(type(df.index))

									# <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

如果CSV文件的日期字符串为非标准格式，请在read_csv（）的参数date_parser中指定由lambda表达式定义的解析器。

				?

									parser = lambda date: pd.to_datetime(date, format='%Y年%m月%d日')

									df_jp = pd.read_csv('./data/26/sample_date_cn.csv', index_col='date', parse_dates=True, date_parser=parser)

									print(df_jp)

									#             val_1  val_2

									# date

									# 2017-11-01     65     76

									# 2017-11-07     26     66

									# 2017-11-18     47     47

									# 2017-11-27     20     38

									# 2017-12-05     65     85

									# 2017-12-12      4     29

									# 2017-12-22     31     54

									# 2017-12-29     21      8

									# 2018-01-03     98     76

									# 2018-01-08     48     64

									# 2018-01-19     18     48

									# 2018-01-23     86     70

									print(type(df_jp.index))

									# <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

关于pandas.Series

这可能不是实际的模式，但是如果pandas.Series索引是日期字符串。

				?

									s = pd.read_csv('./data/26/sample_date.csv', index_col=0, usecols=[0, 1], squeeze=True)

									print(s)

									# date

									# 2017-11-01    65

									# 2017-11-07    26

									# 2017-11-18    47

									# 2017-11-27    20

									# 2017-12-05    65

									# 2017-12-12     4

									# 2017-12-22    31

									# 2017-12-29    21

									# 2018-01-03    98

									# 2018-01-08    48

									# 2018-01-19    18

									# 2018-01-23    86

									# Name: val_1, dtype: int64

									print(type(s))

									print(type(s.index))

									# <class 'pandas.core.series.Series'>

									# <class 'pandas.core.indexes.base.Index'>

如果要将此索引转换为DatetimeIndex，则可以通过将用to_datetime转换的索引替换为属性索引来覆盖它。

				?

									s.index = pd.to_datetime(s.index)

									print(s)

									# date

									# 2017-11-01    65

									# 2017-11-07    26

									# 2017-11-18    47

									# 2017-11-27    20

									# 2017-12-05    65

									# 2017-12-12     4

									# 2017-12-22    31

									# 2017-12-29    21

									# 2018-01-03    98

									# 2018-01-08    48

									# 2018-01-19    18

									# 2018-01-23    86

									# Name: val_1, dtype: int64

									print(type(s))

									print(type(s.index))

									# <class 'pandas.core.series.Series'>

									# <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

									print(s['2017-12-15':'2018-01-15'])

									# date

									# 2017-12-22    31

									# 2017-12-29    21

									# 2018-01-03    98

									# 2018-01-08    48

									# Name: val_1, dtype: int64