Introduction

When retrieving data in Python I make heavy use of Pandas libraries.

However, beginners in data analysis

I can't imagine how to get the data
Not sure which method to use

It will be a situation like that.

In this article I tried to summarize the methods that frequently appear when extracting data.

This article is mainly for those who are learning Python from now on.
For details, please refer to the Official Document.

environment

Python 3.7.6
Pandas 1.0.0

What is Pandas?

It is one of the Python libraries for efficient data analysis.

Implementation

Load necessary data

This time, we will use the "iris" dataset, which is available as standard in seaborn.

import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()

スクリーンショット 2020-05-06 11.44.08.png

Extract data by specifying a matrix

You can get the data freely by specifying the row number and column number.

Data extraction with line numbers

#3rd line data
iris.iloc[3]

#Data on lines 0 and 2
iris.iloc[:3]

#3rd row, 1st column value
iris.iloc[3, 0]

#Data in the 0th to 2nd rows and 2nd to 3rd columns
iris.iloc[:3, 2:4]

Extract data by specifying row number and column name

iris.loc[[2,4,6],['petal_length', 'petal_width']]

Extract data under specific conditions

The method for extracting data by specifying conditions is as follows.

Data extraction based on exact match conditions

For the items of species, get the number of cases whose contents match setosa.

len(iris[iris['species'] == 'setosa'])

Data extraction using multiple conditions

When extracting data, if you want to narrow down by multiple conditions, you can do it by adding conditions.

#and condition is()&()And the or condition is()|()
iris[(iris['species'] == 'setosa') & (iris['petal_width'] > 0.5)]

Data extraction by partial match condition

There are cases where you want to extract not only exact matches but also partially matched contents. In such cases, the following contents can be used.

#Partial match search(Extract only those that partially match se)
iris[iris.species.str.contains('se')]

Aggregate data

At the time of aggregation, it is processed after being converted to DataFrameGroupBy type.

iris_group = iris.groupby('species')
type(iris_group)

The output result is as follows.

pandas.core.groupby.generic.DataFrameGroupBy

Average value

iris_group.mean()

The output image is as follows.

スクリーンショット 2020-05-06 11.53.53.png

In addition, the minimum value, maximum value, standard deviation, etc. can be calculated.

Aggregation is also possible based on multiple conditions.

iris_group2 = iris.groupby(['species', 'petal_width'])
iris_group2.mean()

スクリーンショット 2020-05-06 11.55.44.png

Combine data

Combine data with the same column structure

To combine data that have the same column structure, use the append method or concat method.

This time, we will focus on the Panadas method, so we will combine it with the concat method.

import pandas as pd
iris_master = pd.DataFrame([['0', 'setosa'], ['1', 'versicolor'], ['2', 'virginica']], columns=['id', 'name'])
iris_master

add_iris = pd.DataFrame([['3', 'hoge']], columns=['id', 'name'])
add_iris

pd.concat([iris_master, add_iris])

Combine data with different column configurations

Use the merge method when merging multiple data with different data column configurations. (Although it is possible to join with the join method, it is necessary to index the column you want to use as a key, which is a little troublesome, so I think that there is no problem if the merge method can be used first.)

When joining, by specifying the key item for joining, Join rows with the same items.

pd.merge(iris_group2.mean(), iris_master, left_on='species', right_on='name')

スクリーンショット 2020-05-06 12.04.02.png

Finally

In the future, we plan to enhance the following contents.

Read data (read_csv)

Reference information

The above contents are summarized based on the following sites.

It is explained in more detail here, so if you have any questions, please refer to it.

Summary of Pandas methods used when extracting data [Python]