This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

** Be careful if you are trying to do it as it includes spoilers **

The reason I'm writing here is because I earn about one page to prevent spoilers ()

I got tired of it on the way, so I dug up the contents of Docker.

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This time from 23 to 28. [Last time] 19-22 [First time with table of contents]

23rd

`mine23.py`


df=df_receipt
df=df.groupby('store_cd').agg({'amount':'sum','quantity':'sum'}).reset_index()
df.head(10)

Yes, suddenly a new way of writing came. Seeing the reference page, it seems that it can be used for data aggregation.

Even in Excel, the same sum, min, max can be understood, but the average of writing ave unintentionally …… I haven't used it since I was taken care of by std or an examinee, but I wonder if it will be taken care of from now on ...

min': minimum value 'max': maximum value 'mean': mean 'median': median 'std': standard deviation

Also, it's hard to understand at first glance

df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})

In the part of, there is a way to write the maximum and minimum of "B" for each "A". However, if you write this way, a hierarchy will be created. ~~ The reference book says it's convenient, but this is very annoying ~~

Digression

`yodan.py`


df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']})

df['sales_ymd']#'sales_ymd'Index disappears'max''min'Column is projected

df['sales_ymd'][['max']]#'max'Projection only in columns

It's very annoying to have to do this when you want to refer to the hierarchy I struggled with 23-27, so I will post it for the time being

24th and 25th

`mine24.py`


df=df_receipt
df.groupby('customer_id').agg({'sales_ymd':'max'}).reset_index().head(10)

It's the max version of 23 (or rather, the digression rewrote this)

`mine25.py`


'''Model answer'''
df_receipt.groupby('customer_id').agg({'sales_ymd':'min'}).head(10)

This is a model answer. The model answer was quiet because there were many simple things that I couldn't understand without cheating recently.

26th

P-026: For the receipt detail data frame (df_receipt), find the newest sales date (sales_ymd) and the oldest sales date for each customer ID (customer_id), and display 10 different data.

`mine26.py`


df=df_receipt
df=df.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df=df[df['sales_ymd']['max'] != df['sales_ymd']['min']]
df.head(10)

'''Model answer'''
df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index()
df_tmp.columns = ["_".join(pair) for pair in df_tmp.columns]
df_tmp.query('sales_ymd_max != sales_ymd_min').head(10)

What is the second line of this model answer ... I understand that the hierarchy is erased.

The result of my output of doing this is

customer_id		sales_ymd
	max	min

While it looks like this

customer_id_	sales_ymd_max	sales_ymd_min

I understand that it looks beautiful like this ~~, but after all the hierarchy is an obstacle ~~

27th and 28th

`mine27.py`


df=df_receipt
df=df.groupby('store_cd').agg({'amount':['mean']}).reset_index()
df.columns=['store_id','amount_mean']
df=df.sort_values('amount_mean',ascending= False)
df.head(5)

'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)

`mine28.py`


df=df_receipt
df=df.groupby('store_cd').agg({'amount':['median']}).reset_index()
df.columns=['store_id','amount_median']
df=df.sort_values('amount_median',ascending= False)
df.head(5)

'''Model answer'''
df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)

27 and 28 are sorted. It's a secret that I wrote ~~ ʻave` ~~ Up to this point, it was on the reference site, so it went smoothly to some extent. The problem is next time.

Up to here for this time

Next time, mathematical violence will attack the author who was mossed in Number IIB! ~~ Mock test? I earned points by programming! ~~

Data Science 100 Knock ~ Battle for less than beginners part4

23rd

mine23.py

Digression

yodan.py

24th and 25th

mine24.py

mine25.py