TL;DR
BEFORE
dataframe_ = dataframe.loc[(dataframe.time == 'pre') & \
(dataframe.group == 'exp') & \
(dataframe.cond == 'a'), :]
sns.regplot(x='mood', y='score', data=dataframe_)
↓↓↓
AFTER
dataframe.by(time='pre', cond='exp', group='a').regplot(x='trait', y='score')
You can add your favorite methods to pandas DataFrame (and Series) by using pandas_flavor.
** It is troublesome to extract the parts that meet the conditions from the Long format data! ** **
For example, suppose you have this data.
The setting was that 50 subjects were divided into two groups (group: exp, ctrl), and some intervention was performed in each group. The task was performed before and after the intervention (time: pre, post), and the score was measured under the two conditions (cond: a, b) during the task. At the same time, the mood when doing the task was also measured for each condition (cond: a, b). [^ 1]
If the measurement data is summarized in long format as shown in the image above, subsequent analysis will be easier.
Well, before doing various analyzes, for the time being ** Let's plot the correlation between score and mood when the task condition a of the exp group in pre is **.
The lines that meet the above conditions will be extracted, so the code will look like this.
dataframe_ = dataframe.loc[(dataframe.time == 'pre') & \
(dataframe.group == 'exp') & \
(dataframe.cond == 'a'), :]
sns.regplot(x='mood', y='score', data=dataframe_)
I make a bool type Series that expresses the conditions and put it in .loc
.
Well, it's kind of dirty.
If you use the .query ()
method, you can write like this.
dataframe_ = dataframe.query('time == "pre" & group == "exp" & cond == "a"')
sns.regplot(x='mood', y='score', data=dataframe_)
This one is a lot cleaner, but I wonder if it feels a little better.
It seems that the method of using .query ()
is slower than the method of using bool's Series.
After all, it is troublesome to extract the parts that meet the conditions from the ** Long format data! ** **
** Then you should create a method **
Therefore, let's create a ** new method ** that extracts rows that meet the conditions from the DataFrame.
↓ Add a new .by ()
method that can be used like this to DataFrame.
dataframe.by(time='pre', cond='exp', group='a')
You can easily achieve this with a package called pandas_flavor.
pip or
pip install pandas_flavor
It is one shot with conda.
conda install -c conda-forge pandas_flavor
import pandas_flavor as pf
@pf.register_dataframe_method
def by(self, **args):
for key in args.keys():
self = self.loc[self.loc[:, key] == args[key], :]
return self
Just write a function and add @ pf.register_dataframe_method
as a decorator.
In this example, the argument is received as a dictionary by doing ** args
.
This extracts the line specified by each argument.
Furthermore, it would be nice to make various seaborn functions into methods.
@pf.register_dataframe_method
def regplot(self, **args):
return sns.regplot(data=self, **args)
And it looks like this.
If you want to add a method to pandas.Series, you can do the same with @ pf.register_series_method
.
In this example ... I think it's okay to use .query ()
, but it seems that it can be applied in various ways.
[^ 1]: Needless to say, it's all a fake psychological experiment. The numbers are generated by the random module.
Recommended Posts