This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~
100 knock articles 100 Knock Guide
** Be careful if you are trying to do it as it includes spoilers **
The reason I'm writing here is because I earn about one page to prevent spoilers ()
I got tired of it on the way, so I dug up the contents of Docker.
This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.
This time from 23 to 28. [Last time] 19-22 [First time with table of contents]
mine23.py
df=df_receipt
df=df.groupby('store_cd').agg({'amount':'sum','quantity':'sum'}).reset_index()
df.head(10)
Yes, suddenly a new way of writing came. Seeing the reference page, it seems that it can be used for data aggregation.
Even in Excel, the same sum, min, max can be understood, but the average of writing ave unintentionally …… I haven't used it since I was taken care of by std or an examinee, but I wonder if it will be taken care of from now on ...
min': minimum value 'max': maximum value 'mean': mean 'median': median 'std': standard deviation
Also, it's hard to understand at first glance
df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})
In the part of, there is a way to write the maximum and minimum of "B" for each "A". However, if you write this way, a hierarchy will be created. ~~ The reference book says it's convenient, but this is very annoying ~~
yodan.py
df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']})
df['sales_ymd']#'sales_ymd'Index disappears'max''min'Column is projected
df['sales_ymd'][['max']]#'max'Projection only in columns
It's very annoying to have to do this when you want to refer to the hierarchy I struggled with 23-27, so I will post it for the time being
mine24.py
df=df_receipt
df.groupby('customer_id').agg({'sales_ymd':'max'}).reset_index().head(10)
It's the max version of 23 (or rather, the digression rewrote this)
mine25.py
'''Model answer'''
df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).head(10)
This is a model answer. The model answer was quiet because there were many simple things that I couldn't understand without cheating recently.
P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.
mine26.py
df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df=df[df['sales_ymd']['max'] != df['sales_ymd']['min']]
df.head(10)
'''Model answer'''
df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df_tmp.columns = ["_".join(pair) for pair in df_tmp.columns]
df_tmp.query('sales_ymd_max != sales_ymd_min').head(10)
What is the second line of this model answer ... I understand that the hierarchy is erased.
The result of my output of doing this is
customer_id | sales_ymd | |
---|---|---|
max | min |
While it looks like this
customer_id_ | sales_ymd_max | sales_ymd_min |
---|
I understand that it looks beautiful like this ~~, but after all the hierarchy is an obstacle ~~
mine27.py
df=df_receipt
df=df.groupby('store_cd').agg({'amount':['mean']}).reset_index()
df.columns=['store_id','amount_mean']
df=df.sort_values('amount_mean',ascending= False)
df.head(5)
'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)
mine28.py
df=df_receipt
df=df.groupby('store_cd').agg({'amount':['median']}).reset_index()
df.columns=['store_id','amount_median']
df=df.sort_values('amount_median',ascending= False)
df.head(5)
'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)
27 and 28 are sorted. It's a secret that I wrote ~~ ʻave` ~~ Up to this point, it was on the reference site, so it went smoothly to some extent. The problem is next time.
Next time, mathematical violence will attack the author who was mossed in Number IIB! ~~ Mock test? I earned points by programming! ~~