Data Science 100 Knock ~ Battle for less than beginners part3

This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

100 knock articles 100 Knock Guide

** Be careful if you are trying to do it as it includes spoilers **

At the time of writing the article, it was finished up to the 27th, but there were many writing styles I did not know, and there were many saying "I wrote this, but the answer was this", so I will leave it as a memo.

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This time from 19 to 22. [Last time] 10-18 First time with table of contents

The battle begins here

19th

P-19: Give ranks to the receipt detail data frame (df_receipt) in descending order of sales amount (amount) per item, and extract the first 10 items. Items should display customer ID (customer_id), sales amount (amount), and assigned rank. If the sales amount (amount) is the same, the same ranking shall be given.

The requirement this time is

id amo rank
A00 100 1
B00 70 2
C00 80 2
D00 70 4

It seems that it is to make a table shaped like In SQL

rei.sql


SELECT id,amo amo,
(SELECT count(amo) FROM df b WHERE a.amo < b.amo)+1 rank
FROM df a ORDER BY rank 

(I hit SQL after a long time) For the time being, count the number of sales b.amo that are higher than the sales that ʻamo and ʻa.amo are looking at now (because the largest is 0) + 1 The content is to create a rank and add that column. ** Caution **: This is a writing style that takes a lot of time to calculate. I tried it with 10 knocks SQL, but it took a few minutes.

** Same for Dataframe ** df_receipt[['customer_id', 'amount']] And a rank column and join (horizontally) to create a table

How to make a rank sequence df_receipt['amount'].rank(method='min', ascending=False)

--ʻAscending = False, descending order (more is prioritized) data. --method ='min'` returns a small number (minimum value) as rank data.

** To join horizontally ** pd.concat ([['Column A'], ['Column B']], axis = 1) pd.concat seems to be vertically joined (the table sticks to the bottom of the table) when axis is the default, but horizontally joined when ʻaxis = 1` is added.

Finally, if you change the column name and sort in ascending rank

mine19.py


df_tmp = pd.concat([df_receipt[['customer_id', 'amount']] ,df_receipt['amount'].rank(method='min', ascending=False)], axis=1)
df_tmp.columns = ['customer_id', 'amount', 'ranking']
df_tmp.sort_values('ranking', ascending=True).head(10)

Will be like this

** Of course ** I don't know and cheat

20th

mine20.py


df=df_receipt
df=pd.concat([df[['customer_id','amount']] ,df['amount'].rank(method='first',ascending=False) ],axis=1)
df.columns=['customer_id','amount','ranking']
df.sort_values('ranking',ascending=True).head(10)

I will omit it because the 19th and some contents will be covered The part that changes

In addition, even if the sales amount (amount) is the same, give a different ranking.

Part of. That is, as a display

id amo rank
A00 100 1
B00 70 2
C00 80 3
D00 70 4

I want to take the form.

-method ='min' returns a small number (minimum value) as rank data.

You can change this, and after making the data in descending order, change 'min' to'first' to number them in the order of appearance.

21,22th

mine21.py


len(df_receipt)

mine22.py


len(df_receipt['customer_id'].unique())

~~ Do you have a commentary? ~~ If you convert the 22nd to SQL SELECT COUNT(DISTINCT customer_id) FROM receipt I can go. ~~ Which is easier? ~~

Up to here for this time

I actually felt that Python was faster than SQL when comparing SQL and Python, so I finally started to find meaning in using pandas. ~~ LIKE and SQL are easy ~~ If you just solve the problem, it doesn't come to your mind so much, so I would like to sort it out after solving the problem again.

Recommended Posts

Data Science 100 Knock ~ Battle for less than beginners part3
Data Science 100 Knock ~ Battle for less than beginners part6
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part1
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7
Data Science 100 Knock ~ Battle for less than beginners part4
Data Science 100 Knock ~ Battle for less than beginners part11
Data science 100 knocks ~ Battle for less than beginners part5
Data science 100 knocks ~ Battle for less than beginners part10
Data science 100 knocks ~ Battle for less than beginners part8
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P061 ~ 080)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
How to implement 100 data science knocks for data science beginners (for windows10 Home)
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Time series data anomaly detection for beginners
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Linux command] less command option list [Must-see for beginners]
For new students (Recommended efforts for Python beginners Part 1)
How to use data analysis tools for beginners
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Data science 100 knock (structured data processing) environment construction (Windows10)
Basics of pandas for beginners ② Understanding data overview