OverView
Summarize the basic usage of the specifications when acquiring each aggregation result in Pandas.
Hereafter, data
is used as a DataFrame object.
data[<index>]
#Get row at index 1
# (Since the index starts from 0, it will be the second line.)
data[1]
data[<column_name>]
#Column name'name'To get the column of
data['name']
data[<Boolean per index>]
The boolean image for each index is as follows. In this example, only the rows with indexes 0 and 2 are extracted.
0 True
1 False
2 True
3 False
An example of index unit boolean generation is as follows. You can use the Python judgment formula as it is.
#The value of the age column is 20 or more
data['age'] >= 20
#The name column is'Including the character of ‘ro’
data['name'].str.contains('Ro')
#The name column is unique
data[~data['name'].duplicated()]
Let's output various response formats based on "Get only the lines that meet the conditions **" that was dealt with at the end of the basic edition.
Here, the statistical data stored in data
is as follows.
index | height | class | grade | weight |
---|---|---|---|---|
0 | 178.820262 | a | 2 | 65.649452 |
1 | 172.000786 | b | 5 | 55.956723 |
2 | 179.337790 | a | 4 | 56.480630 |
3 | 181.204466 | b | 1 | 62.908190 |
4 | 169.483906 | a | 4 | 65.768826 |
5 | 174.893690 | b | 4 | 56.188545 |
First, let's review the basics.
data[data['class'] == 'a']
height class grade weight
0 178.820262 a 2 65.649452
2 179.337790 a 4 56.480630
4 169.483906 a 4 65.768826
By executing .values
, each value of one record is made into one list, and a double list that has it as an element for each record is acquired.
data[data['class'] == 'a'].values
[[178.8202617298383 'a' 2 65.64945209116877]
[179.33778995074982 'a' 4 56.48062978465752]
[169.4839057410322 'a' 4 65.76882607944115]]
Execute .loc
to get the index and the specified column value.
This format can be passed to the matplotlib plot.
data.loc[data['class'] == 'a', 'height']
0 178.820262
2 179.337790
4 169.483906
#Plot depiction using args
args = data.loc[data['class'] == 'a', 'height']
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(args)
plt.show()
If the result of .loc
is .values
, only the specified column values can be obtained in list format.
This is available in hist on matplotlib.
data.loc[data['class'] == 'a', 'height'].values
[178.82026173 179.33778995 169.48390574]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(data.loc[data['class'] == 'a', 'height'].values)
plt.show()
The more records you have, the more brilliant the graph will be!
Recommended Posts