This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~
100 knock articles 100 Knock Guide
** Be careful if you are trying to do it as it includes spoilers **
At the time of writing the article, it was finished up to the 27th, but there were many writing styles I did not know, and there were many saying "I wrote this, but the answer was this", so I will leave it as a memo.
This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.
This time from 19 to 22. [Last time] 10-18 First time with table of contents
P-19: Give ranks to the receipt detail data frame (df_receipt) in descending order of sales amount (amount) per item, and extract the first 10 items. Items should display customer ID (customer_id), sales amount (amount), and assigned rank. If the sales amount (amount) is the same, the same ranking shall be given.
The requirement this time is
id | amo | rank |
---|---|---|
A00 | 100 | 1 |
B00 | 70 | 2 |
C00 | 80 | 2 |
D00 | 70 | 4 |
It seems that it is to make a table shaped like In SQL
rei.sql
SELECT id,amo amo,
(SELECT count(amo) FROM df b WHERE a.amo < b.amo)+1 rank
FROM df a ORDER BY rank
(I hit SQL after a long time)
For the time being, count the number of sales b.amo
that are higher than the sales that ʻamo and ʻa.amo
are looking at now (because the largest is 0) + 1 The content is to create a rank and add that column.
** Caution **: This is a writing style that takes a lot of time to calculate. I tried it with 10 knocks SQL, but it took a few minutes.
** Same for Dataframe **
df_receipt[['customer_id', 'amount']]
And a rank column
and join (horizontally) to create a table
How to make a rank sequence
df_receipt['amount'].rank(method='min', ascending=False)
--ʻAscending = False, descending order (more is prioritized) data. --
method ='min'` returns a small number (minimum value) as rank data.
** To join horizontally **
pd.concat ([['Column A'], ['Column B']], axis = 1)
pd.concat
seems to be vertically joined (the table sticks to the bottom of the table) when axis is the default, but horizontally joined when ʻaxis = 1` is added.
Finally, if you change the column name and sort in ascending rank
mine19.py
df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] ,df_receipt['amount'].rank(method='min', ascending=False)], axis=1)
df_tmp.columns = ['customer_id', 'amount', 'ranking']
df_tmp.sort_values('ranking', ascending=True).head(10)
Will be like this
** Of course ** I don't know and cheat
mine20.py
df=df_receipt
df=pd.concat([df[['customer_id','amount']] ,df['amount'].rank(method='first',ascending=False) ],axis=1)
df.columns=['customer_id','amount','ranking']
df.sort_values('ranking',ascending=True).head(10)
I will omit it because the 19th and some contents will be covered The part that changes
In addition, even if the sales amount (amount) is the same, give a different ranking.
Part of. That is, as a display
id | amo | rank |
---|---|---|
A00 | 100 | 1 |
B00 | 70 | 2 |
C00 | 80 | 3 |
D00 | 70 | 4 |
I want to take the form.
-
method ='min'
returns a small number (minimum value) as rank data.
You can change this, and after making the data in descending order, change 'min'
to'first'
to number them in the order of appearance.
mine21.py
len(df_receipt)
mine22.py
len(df_receipt['customer_id'].unique())
~~ Do you have a commentary? ~~
If you convert the 22nd to SQL
SELECT COUNT(DISTINCT customer_id) FROM receipt
I can go. ~~ Which is easier? ~~
I actually felt that Python was faster than SQL when comparing SQL and Python, so I finally started to find meaning in using pandas. ~~ LIKE and SQL are easy ~~ If you just solve the problem, it doesn't come to your mind so much, so I would like to sort it out after solving the problem again.