Pandas study notes.
http://pandas.pydata.org/pandas-docs/stable/groupby.html As I read here, it was difficult to understand the example of value completion of group by, so I will write a simple example.
Preparation.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: key = list('ABCABCABC')
In [4]: value = [1,2,3,np.nan,np.nan,np.nan,4,4,4]
In [5]: df = pd.DataFrame({'key': key, 'value': value})
In [6]: df
Out[6]:
key value
0 A 1.0
1 B 2.0
2 C 3.0
3 A NaN
4 B NaN
5 C NaN
6 A 4.0
7 B 4.0
8 C 4.0
If you ffill ()
without grouping, all three NaNs will be complemented with value
3.0 with ʻindex` of 2.
In [7]: df.ffill()
Out[7]:
key value
0 A 1.0
1 B 2.0
2 C 3.0
3 A 3.0
4 B 3.0
5 C 3.0
6 A 4.0
7 B 4.0
8 C 4.0
If you group by key and then ffill (), NaN will be complemented by the value immediately before NaN for each group. Therefore, when ʻindex is 0, 1, 2 (
keyis A, B, C respectively),
value 1.0, 2.0, 3.0 and ʻindex
3, 4, 5 ( key
is A, B, respectively) C) value
is complemented.
In [8]: df.groupby('key').ffill()
Out[8]:
key value
0 A 1.0
1 B 2.0
2 C 3.0
3 A 1.0
4 B 2.0
5 C 3.0
6 A 4.0
7 B 4.0
8 C 4.0
Where value
is NaN
, take the average value for each group and fill it.
In [9]: f = lambda x: x.fillna(x.mean())
In [10]: transformed = df.groupby('key').transform(f)
In [11]: transformed
Out[11]:
value
0 1.0
1 2.0
2 3.0
3 2.5
4 3.0
5 3.5
6 4.0
7 4.0
8 4.0
If you take the average for each group before and after filling, you get the same value (GroupBy.mean ()
[excludes NaN
from calculation](http://pandas.pydata. org / pandas-docs / stable / generated / pandas.core.groupby.GroupBy.mean.html # pandas.core.groupby.GroupBy.mean)).
In [12]: df.groupby('key').mean()
Out[12]:
value
key
A 2.5
B 3.0
C 3.5
In [13]: transformed.groupby(key).mean()
Out[13]:
value
A 2.5
B 3.0
C 3.5
Recommended Posts