Functions that are unexpectedly necessary when creating tables and graphs. I tried to summarize such a pivot table function that there is a difference between heaven and earth if you do not know it. By the way, when I was a student, I didn't even know the pivot table function of Excel, and I remember crying and shaping the experimental data ... Since it was posted to Qiita, I tried it with the pandas data frame this time.
One of the functions in Excel, it refers to a tool that extracts necessary data from a list registered in advance and aggregates and analyzes it from all directions. The "aggregate" function is especially convenient, and you can return the result shown on the right from the data format shown on the left in the figure below.
You will need it when you want to cross-tabulate or when you want to create a slightly complicated graph. This time I'll try something similar on a pandas dataframe instead of Excel.
Borrowed train data from kaggle Titanic Competition. Please refer to here for the meaning of each column.
It looks like this as it is ... Difficult to interpret!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Path
input_path = "../input_data/"
# Set Display Max Columns
pd.set_option("display.max_columns", 50)
train = pd.read_csv(input_path + "train.csv", sep=",", header=0, quotechar="\"")
# test = pd.read_csv(input_path + "bank/test.csv", sep=",", header=0, quotechar="\"")
#Missing value processing because it is difficult to handle
train["Embarked"] = train.Embarked.fillna("S")
train["Age"]=train["Age"].fillna(train.Age.mean())
#Age by age
labels = [ "{0} - {1}".format(i, i + 9) for i in range(0, 100, 10) ]
train["Age"] = pd.cut(train["Age"], np.arange(0, 101, 10),
include_lowest=True, right=False,
labels=labels)
#Focus on what seems to be easy to handle for this purpose
df = train[["PassengerId", "Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]]
#Data confirmation
df.head()
The names of passengers who seem to be difficult to handle are deleted.
At a minimum, the following three arguments are required
-Data (first argument): Specify the pandas.DataFrame object of the original data. -Index: Specify the column name of the original data. It becomes the result line heading. -Columns: Specify the column name of the original data. The resulting column heading.
The average value of the columns not specified in the arguments index and columns is calculated as a result, but the columns whose type is not numeric are excluded.
df.pivot_table(index="Age", columns= "Sex", values="Fare")
By default, the average value of the aggregation is returned. (Other results can be calculated by adding aggfunc to the argument.)
By default, the average value is calculated, but it is possible to calculate the value by other methods by specifying a function in the argument aggfunc. By default (when the argument aggfunc is omitted), numpy.mean () is specified.
df.pivot_table(index="Age", columns= "Sex", values="Fare", aggfunc='count')
By specifying aggfunc ='count', the number of applicable data is counted.
df.pivot_table(index="Age", columns= ["Pclass","Sex"], values="Survived", aggfunc='mean')
df.pivot_table(index="Age", columns= ["Pclass","Sex"], values="Survived", aggfunc=["mean","count"])
df.pivot_table(index="Age", columns= ["Pclass","Sex"], values="Survived", aggfunc=["mean","count"],margins=True, margins_name="total")
By specifying the argument normalize of the cross_tab function, it can be standardized for the whole, for each row, and for each column.
pd.crosstab(index=df.Age, columns= df.Sex, values=df.Fare, aggfunc='count', normalize=True)
-If normalize = True or normalize ='all', the total is standardized to 1. -If normalize ='index' or'columns', it is standardized so that the total is 1 for each row or column.
that's all.
After operating the pivot table, the columns are multi-layered, but I cannot extract them by specifying the columns. If anyone knows, please comment ...
・ Http://yaginogogo.hatenablog.jp/entry/2016/04/22/011327 ・ Https://note.nkmk.me/python-pandas-pivot-table/ ・ Https://deepage.net/features/pandas-pivot.html ・ Https://boxil.jp/mag/a2149/ ・ Https://deepage.net/features/pandas-pivot.html ・ Https://qiita.com/kshigeru/items/bfa8c11d1e6487c791d3 ・ Https://qiita.com/hoto17296/items/3442af64c7acb682de6a
Recommended Posts