[For recording] Pandas memorandum

This time, I will summarize the events that I did not immediately understand when using Pandas. How do I operate Pandas? I often get lost, so I hope it will be helpful as much as possible.

Create dataset to use

--First, create a data set appropriately. This article will use this simple data as an example.

python



index = ['Product A', 'Product B', 'Product C']
columns = ['spring', 'summer', 'autumn', 'winter']
data = np.array([
    [0,10,20,30],
    [0,0,100,20],
    [50,100,20,40],
])

df = pd.DataFrame(data, index=index, columns=columns)
df.head()
spring summer autumn winter
Product A 0 10 20 30
Product B 0 0 100 20
Product C 50 100 20 40

Case (1) Extract and add the column names corresponding to the maximum and minimum values of each row, and their values.

--In other words, for product A, winter is the largest, and for product C, summer is the largest, so insert that season into the new column "Selling season".

python


for index, row in df.iterrows():
    
#     df.ix[index, 'Selling season'] = row.argmax()
    df.ix[index, 'Selling season'] = row.idxmax() #You can do either

#     df.ix[index, 'Unsellable season'] = row.argmin()
    df.ix[index, 'Unsellable season'] = row.idxmin() #You can do either
    
    df.ix[index, 'MAX'] = row.max()
    df.ix[index, 'MIN'] = row.min()

df.head()
spring summer autumn winter Selling season Unsellable season MAX MIN
Product A 0 10 20 30 winter spring 30 0
Product B 0 0 100 20 autumn spring 100 0
Product C 50 100 20 40 summer autumn 100 20

Case (2) Extract and add the column name that becomes a positive integer first in each row

――In other words, I want to put out the season when each product starts to sell for the first time. A is summer, B is autumn

python


for index, row in df.iterrows():
#index[0]Take out the first one that meets the conditions with
    df.ix[index, 'Season to start selling'] = row[row > 0].index[0] 
    
df.head()
spring summer autumn winter Season to start selling
Product A 0 10 20 30 summer
Product B 0 0 100 20 autumn
Product C 50 100 20 40 spring

Case ③ I want to extract a line that does not contain any 0

――In other words, I want to extract the line of product C that sells all year round.

python


#not_zero_df = df.query('spring>0 and summer>0 and autumn>0 and winter> 0')
#not_zero_df = df.query('spring!=0 and summer!=0 and autumn!=0 and winter!= 0')

#↑ It's okay, but it can be outside as a variable. Below "@Just add
hoge1,hoge2,hoge3,hoge4 = 0,0,0,0 
not_zero_df = df.query('spring> @hoge1 and summer> @hoge2 and autumn> @hoge3 and winter> @hoge4')
    
not_zero_df.head()
spring summer autumn winter
Product C 50 100 20 40

--By the way, if you want to extract something other than 0 by focusing only on summer, you can also write like this (extract lines where summer is not 0).

python


hoge = 0 
not_zero_df = df.query('summer!= @hoge')
    
not_zero_df.head()
spring summer autumn winter
Product A 0 10 20 30
Product C 50 100 20 30

** We will continue to add it as needed **

Recommended Posts

[For recording] Pandas memorandum
Pandas memorandum
pandas memorandum
Pandas operation memorandum
Freecad memorandum (for myself)
Memorandum (pseudo Vlookup by pandas)
Learning memorandum for me w
Memorandum @ Python OR Seminar: Pandas
100 Pandas knocks for Python beginners
Linux command memorandum [for beginners]
Pandas basics for beginners ① Reading & processing
Pandas / DataFrame Tips for practical use
Development memorandum ~ pandas, forecast, data structure ~
Pandas basics for beginners ⑧ Digit processing
Pandas
Pandas basics summary link for beginners
For Pandas users to practice SQL easily
Precautions when using for statements in pandas
TFRecord file creation memorandum for object detection
Tips for plotting multiple lines with pandas
Pandas of the beginner, by the beginner, for the beginner [Python]
Frequently used syntax memorandum for each language
Best practices for messing with data with pandas
A memorandum of method often used when analyzing data with pandas (for beginners)