[For beginners] Script within 10 lines (5. Resample of time series data using pandas)

[For beginners] Script within 10 lines (5. Resample of time series data using pandas)

If you use various libraries with python, I thought, "You can do a little thing with a little code, and you can make a little script with a little 5 steps, which is convenient." So I just listed python and other commands. I may come up with this, but I will post a 10-step script on an irregular basis.

As ** 5th **, I would like to post a resample of time series data using pandas. </ font>

I think there are many situations where you want to aggregate monthly or quarterly data on sales, financial product prices, and various other time-series data. Knowing that it can be done using pandas resample, I would like to post an example of totaling annually and quarterly using exchange rate data as an example, without forgetting myself.


【environment】 Linux: debian10 python: 3.7.3 pandas: 1.0.3 jupyterlab: 2.1.0

For the data, I would like to use fred's exchange rate data.

If you have the following data

DATEDEXJ,PUS 2015-01-01,NaN 2015-01-02,120.20 2015-01-05,119.64

** Resample syntax on average ** data frame .resample (rule ='frequency code'). mean () </ font> -Frequency code: Annual ('Y'), Quarterly ('Q'), Monthly ('M'), Weekly ('W') -As an example, when storing the annual average of the data frame "df_dexjpus" in the data frame "df_year" df_year = df_dexjpus.resample(rule='Y').mean()

1. A script that resamples exchange data on average annually and quarterly with pandas

The code ran in jupyter.

python3.7


#Exchange rate resample(Annual / Quarterly / Monthly / Weekly)
#Forex data frame(df_dexjpus)
# infile = ('./dexjpus_20200417.csv')
import pandas as pd
import matplotlib.pyplot as plt

df_dexjpus = pd.read_csv('./dexjpus_20200417.csv' ,index_col='DATE' ,parse_dates=['DATE'])

#Annual
df_year = df_dexjpus.resample(rule='Y').mean()

#2017 data
df_2017 = df_dexjpus.loc['2017-01-01' : '2017-12-31']

#2017 quarterly exchange rate average
df_quarter = df_2017.resample(rule='Q').mean()    
 

2. I would like to take a look at the scripts in order.

** 2-1. Import library, read data **

python3.7



#Exchange rate resample(Annual / Quarterly / Monthly / Weekly)
#Forex data frame(df_dexjpus)
# infile = ('./dexjpus_20200417.csv')
import pandas as pd
import matplotlib.pyplot as plt

df_dexjpus = pd.read_csv('./dexjpus_20200417.csv' ,index_col='DATE' ,parse_dates=['DATE'])

df_dexjpus


	DEXJPUS
DATE 	
2015-01-01 	NaN
2015-01-02 	120.20
2015-01-05 	119.64
2015-01-06 	118.26
2015-01-07 	119.5

** 2-2. I'll take a look every year. ** **

python3.7



#Annual
df_year = df_dexjpus.resample(rule='Y').mean()

df_year

	DEXJPUS
DATE 	
2015-12-31 	121.049084
2016-12-31 	108.656932
2017-12-31 	112.098554
2018-12-31 	110.397390
2019-12-31 	109.018835
2020-12-31 	108.775333

** 2-3. Let's take a look at 2017 as an example. ** **

python3.7



#2017 data
df_2017 = df_dexjpus.loc['2017-01-01' : '2017-12-31']

 	DEXJPUS
DATE 	
2017-01-31 	114.872105
2017-02-28 	112.911579
2017-03-31 	112.916522
2017-04-30 	110.091000
2017-05-31 	112.243636
2017-06-30 	110.914091
2017-07-31 	112.417000
2017-08-31 	109.826957
2017-09-30 	110.776000
2017-10-31 	112.914762
2017-11-30 	112.819000
2017-12-31 	112.940500

** 2-4. Finally, I would like to look at the data for each quarter of 2017. ** **

python3.7



#2017 quarterly exchange rate average
df_quarter = df_2017.resample(rule='Q').mean()

df_quarter

 	DEXJPUS
DATE 	
2017-03-31 	113.524098
2017-06-30 	111.113906
2017-09-30 	110.950476
2017-12-31 	112.891803

** This is a resample of time series data using 5.pandas. ** **

Recommended Posts