本文共 7368 字,大约阅读时间需要 24 分钟。
《Python for Data Analysis》
pd.data_range()
In [15]: rng = pd.date_range('2000-01-01', '2000-06-30', freq='BM')In [16]: rngOut[16]:DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28', '2000-05-31', '2000-06-30'], dtype='datetime64[ns]', freq='BM')In [17]: Series(np.random.randn(6),index=rng)Out[17]:2000-01-31 0.5863412000-02-29 -0.4396792000-03-31 0.8539462000-04-28 -0.7408582000-05-31 -0.1146992000-06-30 -0.529631Freq: BM, dtype: float64
from pandas.tseries.offsets import Hour, Minute
ts.shift()
Period类
、 PeriodIndex类
pd.period_range()
:创建规则的时期范围。
In [20]: rng = pd.period_range('2000-01-01', '2000-06-30', freq='M') ...: rng ...:Out[20]: PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='int64', freq='M')
构造函数:
pd.PeriodIndex()
ts.asfred()
In [21]: rng = pd.date_range('2000-01-01', '2000-06-30', freq='M')In [22]: rngOut[22]:DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30', '2000-05-31', '2000-06-30'], dtype='datetime64[ns]', freq='M')In [23]: rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')In [24]: rngOut[24]: PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='int64', freq='M')
to_period()
to_timestamp()
In [25]: rng = pd.date_range('2000-01-01', periods=3, freq='M') ...: ts = pd.Series(np.random.randn(3), index=rng) ...: ts ...:Out[25]:2000-01-31 0.4559682000-02-29 1.7205532000-03-31 1.695834Freq: M, dtype: float64In [26]: pts = ts.to_period() ...: pts ...:Out[26]:2000-01 0.4559682000-02 1.7205532000-03 1.695834Freq: M, dtype: float64
重采样(resampling)指的是将时间序列从一个频率转换到另一个频率的处理过程。高频率数据聚合到低频率称为降采样(downsamling),而将低频率数据转换到高频率数据则称为升采样(upsampling,通常伴随着插值)。
resample()
: 频率转换工作的主力函数
参数 | 说明 |
---|---|
freq | 表示重采样频率的字符串或DataOffset,例如‘M’、‘5min’、Second(15) |
how=’mean’ | 用于产生聚合值的函数名或数组函数。默认为‘mean’ –> FutureWarning: how in .resample() is deprecated the new syntax is .resample(…).mean() |
axis=0 | 重采样的轴 |
fill_method=None | 升采样时如何插值,如‘ffill’或‘bfill’。默认不插值。 |
closed=’right’ | 降采样时哪一段是闭合的。 |
label=’right’ | 降采样时如何设置聚合值的标签 |
loffset=None | 面元标签的时间校正值,比如‘-1s’或者Second(-1)用于将聚合标签调早1秒 |
limit = None | 在前向或后向填充时,允许填充的最大时期数 |
kind = None | 聚合到时期(Period)或者时间戳(Timestamp),默认聚合到时间序列的索引类型 |
convention=None | 重采样时期时,低频转高频的约定,默认‘end’。 |
resample
看下面的例子,使用resample
对数据进行降采样时,需要考虑两样东西:
In [27]: rng = pd.date_range('2000-01-01', periods=100, freq='D') ...: ts = pd.Series(np.random.randn(len(rng)), index=rng) ...: ts ...:Out[27]:2000-01-01 -0.189731 ...2000-04-09 0.283110Freq: D, dtype: float64In [28]: ts.resample('M').mean()Out[28]:2000-01-31 -0.0192762000-02-29 -0.0411922000-03-31 -0.2145512000-04-30 0.411190Freq: M, dtype: float64In [29]: ts.resample('M', kind='period').mean()Out[29]:2000-01 -0.0192762000-02 -0.0411922000-03 -0.2145512000-04 0.411190Freq: M, dtype: float64
In [31]: rng = pd.date_range('2000-01-01', periods=12, freq='T') ...: ts = pd.Series(np.arange(12), index=rng) ...: ts ...:Out[31]:2000-01-01 00:00:00 02000-01-01 00:01:00 12000-01-01 00:02:00 22000-01-01 00:03:00 32000-01-01 00:04:00 42000-01-01 00:05:00 52000-01-01 00:06:00 62000-01-01 00:07:00 72000-01-01 00:08:00 82000-01-01 00:09:00 92000-01-01 00:10:00 102000-01-01 00:11:00 11Freq: T, dtype: int32In [32]: ts.resample('5min', closed='right', label='right').sum()Out[32]:2000-01-01 00:00:00 02000-01-01 00:05:00 152000-01-01 00:10:00 402000-01-01 00:15:00 11Freq: 5T, dtype: int32In [33]: ts.resample('5min', closed='right', ...: label='right', loffset='-1s').sum()Out[33]:1999-12-31 23:59:59 02000-01-01 00:04:59 152000-01-01 00:09:59 402000-01-01 00:14:59 11Freq: 5T, dtype: int32
打算根据月份或者星期进行分组,传入能够访问时间序列的索引上的这些字段的函数。
In [35]: rng = pd.date_range('2000-01-01', periods=100, freq='D') ...: ts = pd.Series(np.random.randn(len(rng)), index=rng) ...: tsIn [36]: ts.groupby(lambda x : x.month).mean()Out[36]:1 -0.1260082 0.0791323 0.0260934 0.321457dtype: float64In [37]: ts.groupby(lambda x : x.weekday).mean()Out[37]:0 0.2802891 0.1744522 0.1661023 -0.7794894 -0.0361955 0.0863946 0.234831dtype: float64
In [38]: import pandas as pd ...: import numpy as np ...: frame = pd.DataFrame(np.random.randn(2, 4), ...: index=pd.date_range('1/1/2000', periods=2, ...: freq='W-WED'), ...: columns=['Colorado', 'Texas', 'New York', 'Ohio']) ...: frame ...:Out[38]: Colorado Texas New York Ohio2000-01-05 -0.925525 -0.434350 1.037349 -1.5327902000-01-12 1.075744 0.237922 -0.907699 0.592211In [39]: df_daily = frame.resample('D').asfreq() ...: df_daily ...:Out[39]: Colorado Texas New York Ohio2000-01-05 -0.925525 -0.434350 1.037349 -1.5327902000-01-06 NaN NaN NaN NaN2000-01-07 NaN NaN NaN NaN2000-01-08 NaN NaN NaN NaN2000-01-09 NaN NaN NaN NaN2000-01-10 NaN NaN NaN NaN2000-01-11 NaN NaN NaN NaN2000-01-12 1.075744 0.237922 -0.907699 0.592211In [40]: frame.resample('D').ffill()Out[40]: Colorado Texas New York Ohio2000-01-05 -0.925525 -0.434350 1.037349 -1.5327902000-01-06 -0.925525 -0.434350 1.037349 -1.5327902000-01-07 -0.925525 -0.434350 1.037349 -1.5327902000-01-08 -0.925525 -0.434350 1.037349 -1.5327902000-01-09 -0.925525 -0.434350 1.037349 -1.5327902000-01-10 -0.925525 -0.434350 1.037349 -1.5327902000-01-11 -0.925525 -0.434350 1.037349 -1.5327902000-01-12 1.075744 0.237922 -0.907699 0.592211# 之前的frame.resample('D', how='mean')In [41]: df_daily = frame.resample('D').mean() ...: df_daily ...:Out[41]: Colorado Texas New York Ohio2000-01-05 -0.925525 -0.434350 1.037349 -1.5327902000-01-06 NaN NaN NaN NaN2000-01-07 NaN NaN NaN NaN2000-01-08 NaN NaN NaN NaN2000-01-09 NaN NaN NaN NaN2000-01-10 NaN NaN NaN NaN2000-01-11 NaN NaN NaN NaN2000-01-12 1.075744 0.237922 -0.907699 0.592211
In [42]: frame = pd.DataFrame(np.random.randn(24, 4), ...: index=pd.period_range('1-2000', '12-2001', ...: freq='M'), ...: columns=['Colorado', 'Texas', 'New York', 'Ohio']) ...: frame[:5] ...: annual_frame = frame.resample('A-DEC').mean() ...: annual_frame ...:Out[42]: Colorado Texas New York Ohio2000 0.442672 0.104870 -0.067043 -0.1289422001 -0.263757 -0.399865 -0.423485 0.026256In [43]: annual_frame.resample('Q-DEC', convention='end').ffill()Out[43]: Colorado Texas New York Ohio2000Q4 0.442672 0.104870 -0.067043 -0.1289422001Q1 0.442672 0.104870 -0.067043 -0.1289422001Q2 0.442672 0.104870 -0.067043 -0.1289422001Q3 0.442672 0.104870 -0.067043 -0.1289422001Q4 -0.263757 -0.399865 -0.423485 0.026256
转载地址:http://ejoji.baihongyu.com/