Apply Pandas Extraction Basic Grammar / Response to matplotlib

OverView Summarize the basic usage of the specifications when acquiring each aggregation result in Pandas. Hereafter, data is used as a DataFrame object.

Basic edition

Get row at specified index

data[<index>]

#Get row at index 1
# (Since the index starts from 0, it will be the second line.)
data[1]

Get column with specified column name

data[<column_name>]

#Column name'name'To get the column of
data['name']

Get only rows that meet the conditions

data[<Boolean per index>]

The boolean image for each index is as follows. In this example, only the rows with indexes 0 and 2 are extracted.

0     True
1    False
2     True
3    False

An example of index unit boolean generation is as follows. You can use the Python judgment formula as it is.

#The value of the age column is 20 or more
data['age'] >= 20

#The name column is'Including the character of ‘ro’
data['name'].str.contains('Ro')

#The name column is unique
data[~data['name'].duplicated()]

Advanced version

Let's output various response formats based on "Get only the lines that meet the conditions **" that was dealt with at the end of the basic edition.

Here, the statistical data stored in data is as follows.

index height class grade weight
0 178.820262 a 2 65.649452
1 172.000786 b 5 55.956723
2 179.337790 a 4 56.480630
3 181.204466 b 1 62.908190
4 169.483906 a 4 65.768826
5 174.893690 b 4 56.188545

Get only rows that meet the conditions

First, let's review the basics.

data[data['class'] == 'a']

       height class grade weight
0   178.820262   a   2  65.649452
2   179.337790   a   4  56.480630
4   169.483906   a   4  65.768826

Get each value of a row that meets the conditions in list format

By executing .values, each value of one record is made into one list, and a double list that has it as an element for each record is acquired.

data[data['class'] == 'a'].values

[[178.8202617298383 'a' 2 65.64945209116877]
 [179.33778995074982 'a' 4 56.48062978465752]
 [169.4839057410322 'a' 4 65.76882607944115]]

Get index and column values for rows that meet the conditions

Execute .loc to get the index and the specified column value. This format can be passed to the matplotlib plot.

data.loc[data['class'] == 'a', 'height']

0     178.820262
2     179.337790
4     169.483906

#Plot depiction using args
args = data.loc[data['class'] == 'a', 'height']
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(args)
plt.show()

image.png

Get only column values of rows that meet the specified conditions in list format

If the result of .loc is .values, only the specified column values can be obtained in list format. This is available in hist on matplotlib.

data.loc[data['class'] == 'a', 'height'].values

[178.82026173 179.33778995 169.48390574]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(data.loc[data['class'] == 'a', 'height'].values)
plt.show()

image.png

The more records you have, the more brilliant the graph will be!

Recommended Posts

Apply Pandas Extraction Basic Grammar / Response to matplotlib
Introduction to Ansible Part 2'Basic Grammar'
Python3 basic grammar
How to apply markers only to specific data in matplotlib
Let's apply the brand image color to the matplotlib colormap!
Getting Started with pandas: Basic Knowledge to Remember First
[Introduction to Python] Basic usage of the library matplotlib
Introduction to Python numpy pandas matplotlib (~ towards B3 ~ part2)