When retrieving data in Python I make heavy use of Pandas libraries.
However, beginners in data analysis
It will be a situation like that.
In this article I tried to summarize the methods that frequently appear when extracting data.
It is one of the Python libraries for efficient data analysis.
This time, we will use the "iris" dataset, which is available as standard in seaborn.
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
You can get the data freely by specifying the row number and column number.
#3rd line data
iris.iloc[3]
#Data on lines 0 and 2
iris.iloc[:3]
#3rd row, 1st column value
iris.iloc[3, 0]
#Data in the 0th to 2nd rows and 2nd to 3rd columns
iris.iloc[:3, 2:4]
iris.loc[[2,4,6],['petal_length', 'petal_width']]
The method for extracting data by specifying conditions is as follows.
For the items of species, get the number of cases whose contents match setosa
.
len(iris[iris['species'] == 'setosa'])
When extracting data, if you want to narrow down by multiple conditions, you can do it by adding conditions.
#and condition is()&()And the or condition is()|()
iris[(iris['species'] == 'setosa') & (iris['petal_width'] > 0.5)]
There are cases where you want to extract not only exact matches but also partially matched contents. In such cases, the following contents can be used.
#Partial match search(Extract only those that partially match se)
iris[iris.species.str.contains('se')]
At the time of aggregation, it is processed after being converted to DataFrameGroupBy type.
iris_group = iris.groupby('species')
type(iris_group)
The output result is as follows.
pandas.core.groupby.generic.DataFrameGroupBy
iris_group.mean()
The output image is as follows.
In addition, the minimum value, maximum value, standard deviation, etc. can be calculated.
Aggregation is also possible based on multiple conditions.
iris_group2 = iris.groupby(['species', 'petal_width'])
iris_group2.mean()
To combine data that have the same column structure, use the append method or concat method.
This time, we will focus on the Panadas method, so we will combine it with the concat method.
import pandas as pd
iris_master = pd.DataFrame([['0', 'setosa'], ['1', 'versicolor'], ['2', 'virginica']], columns=['id', 'name'])
iris_master
add_iris = pd.DataFrame([['3', 'hoge']], columns=['id', 'name'])
add_iris
pd.concat([iris_master, add_iris])
Use the merge method when merging multiple data with different data column configurations. (Although it is possible to join with the join method, it is necessary to index the column you want to use as a key, which is a little troublesome, so I think that there is no problem if the merge method can be used first.)
When joining, by specifying the key item for joining, Join rows with the same items.
pd.merge(iris_group2.mean(), iris_master, left_on='species', right_on='name')
In the future, we plan to enhance the following contents.
The above contents are summarized based on the following sites.
It is explained in more detail here, so if you have any questions, please refer to it.
Recommended Posts