This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

** Be careful if you are trying to do it as it includes spoilers **

At the time of writing the article, it was finished up to the 27th, but there were many writing styles I did not know, and there were many saying "I wrote this, but the answer was this", so I will leave it as a memo.

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This time from 19 to 22. [Last time] 10-18 First time with table of contents

The battle begins here

19th

P-19: Give ranks to the receipt detail data frame (df_receipt) in descending order of sales amount (amount) per item, and extract the first 10 items. Items should display customer ID (customer_id), sales amount (amount), and assigned rank. If the sales amount (amount) is the same, the same ranking shall be given.

The requirement this time is

id	amo	rank
A00	100	1
B00	70	2
C00	80	2
D00	70	4

It seems that it is to make a table shaped like In SQL

`rei.sql`


SELECT　id,amo amo,
(SELECT count(amo) FROM df b WHERE a.amo < b.amo)+1 rank
FROM df a ORDER BY rank

(I hit SQL after a long time) For the time being, count the number of sales b.amo that are higher than the sales that ʻamo and ʻa.amo are looking at now (because the largest is 0) + 1 The content is to create a rank and add that column. ** Caution **: This is a writing style that takes a lot of time to calculate. I tried it with 10 knocks SQL, but it took a few minutes.

** Same for Dataframe ** df_receipt[['customer_id', 'amount']] And a rank column and join (horizontally) to create a table

How to make a rank sequence df_receipt['amount'].rank(method='min', ascending=False)

[Reference]

--ʻAscending = False, descending order (more is prioritized) data. --method ='min'` returns a small number (minimum value) as rank data.

** To join horizontally ** pd.concat ([['Column A'], ['Column B']], axis = 1) pd.concat seems to be vertically joined (the table sticks to the bottom of the table) when axis is the default, but horizontally joined when ʻaxis = 1` is added.

Finally, if you change the column name and sort in ascending rank

`mine19.py`


df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] ,df_receipt['amount'].rank(method='min', ascending=False)], axis=1)
df_tmp.columns = ['customer_id', 'amount', 'ranking']
df_tmp.sort_values('ranking', ascending=True).head(10)

Will be like this

** Of course ** I don't know and cheat

20th

`mine20.py`


df=df_receipt
df=pd.concat([df[['customer_id','amount']] ,df['amount'].rank(method='first',ascending=False) ],axis=1)
df.columns=['customer_id','amount','ranking']
df.sort_values('ranking',ascending=True).head(10)

I will omit it because the 19th and some contents will be covered The part that changes

In addition, even if the sales amount (amount) is the same, give a different ranking.

Part of. That is, as a display

id	amo	rank
A00	100	1
B00	70	2
C00	80	3
D00	70	4

I want to take the form.

-method ='min' returns a small number (minimum value) as rank data.

You can change this, and after making the data in descending order, change 'min' to'first' to number them in the order of appearance.

21,22th

`mine21.py`


len(df_receipt)

`mine22.py`


len(df_receipt['customer_id'].unique())

~~ Do you have a commentary? ~~ If you convert the 22nd to SQL SELECT COUNT(DISTINCT customer_id) FROM receipt I can go. ~~ Which is easier? ~~

Up to here for this time

I actually felt that Python was faster than SQL when comparing SQL and Python, so I finally started to find meaning in using pandas. ~~ LIKE and SQL are easy ~~ If you just solve the problem, it doesn't come to your mind so much, so I would like to sort it out after solving the problem again.

Data Science 100 Knock ~ Battle for less than beginners part3

The battle begins here

19th

rei.sql

mine19.py