This is a struggle record of knocking 100 eggs (freshly laid) of a data scientist without knowing why. It's a mystery even if you can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~
100 knock articles 100 Knock Guide
Page used as a reference for building the environment last time
** Be careful if you are trying to do it as it includes spoilers **
There were many writing styles I didn't know, and there were many saying "I wrote this, but the answer was this", so I'll put it in place of a memo.
This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.
This solution is wrong! This interpretation is different! Please comment if you have any.
table of contents | problem |
---|---|
part1 | 1~9 |
part2 | 10~18 |
part3 | 19~22 |
part4 | 23~28 |
part5 | 29~32 |
part6 | 33~35 |
[part7] | Not posted |
As expected this can be written. Even if you don't get into your head even if you prepare for it, it will hinder you if you can't write this.
mine01.py
df_receipt.head(10)
~~ Suddenly get an error ~~
It was. When projecting multiple columns, I wrote [['column A','column B']]
. It is a daily occurrence to throw an error.
mine02.py
df_receipt[['sales_ymd','customer_id','product_cd','amount']].head(10)
Give an answer that is different from the answer as soon as possible. I wrote
mine03.py
df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df.columns=['sales_date','customer_id','product_cd','amount']
df.head(10)
It may seem stupid to use three lines for such a simple thing, but it happened because I was solving it while organizing my mind.
Or rather, I haven't used {}
or python since I started doing it, so I simply didn't know rename.
** Model answer **
df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].rename(columns={'sales_ymd': 'sales_date'}).head(10)
I gave a different answer here as well.
mine04.py
df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df
Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].query('customer_id == "CS018205000001"')
Am I the only one who feels resistance to the part where query
is written as a string? ~~ Character string …… WAF …… Regular expression matching …… The head is ~~
With character string type input, even if you make a typo internally, you will not know the error. Is query
faster?
mine05.py
#df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
#df=df[df['customer_id']=='CS018205000001']
df[df['amount']>=1000]
Model answer
df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & amount >= 1000')
Since the preconditions were the same as the 4th one, I used df
as it is.
Again, the model answer is query
. I thought while writing, but is it a deprecated writing style like ʻix`?
df=df[df['customer_id']=='CS018205000001'] df[df['amount']>=1000]
Connect the two lines of
df=df[df['customer_id']=='CS018205000001'][df['amount']>=1000]
Will give the same result, but /opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
I will give a warning.
Since df = df [condition] becomes df, I was angry when I thought that df [condition 1] [condition 2] could also be done.
mine06.py
df=df_receipt[['sales_ymd','customer_id','product_cd','quantity','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[(1000<=df['amount'])|(5<=df['quantity'])]
df
Model answer
df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'quantity', 'amount']].query('customer_id == "CS018205000001" & (amount >= 1000 | quantity >=5)')
Since the conditions are getting longer, it is further divided. Especially, I think it is easier to understand if the AND condition is divided. Will the response go down?
mine07.py
df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[(1000<=df['amount'])&(df['amount']<=2000)]
df
Model answer
df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & 1000 <= amount <= 2000')
It is attractive to be able to write the Between condition in one word. By the way, from this point on, I just thought, "Write in SQL ...".
mine08.py
df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[df['product_cd'] != 'P071401019']
df
Model answer
df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & product_cd != "P071401019"')
I'm wondering if I should conclude the conditions inside under such conditions.
mine09.py
df_store.query('prefecture_cd != "13" & not (floor_area > 900)')
Model answer
df_store.query('prefecture_cd != "13" & floor_area <= 900')
Finally give in to query ~~ Not because it's a rewrite problem. Not sure if it was necessary to use not. Or rather, no.
To be honest, the only thing I couldn't find out was here. From the next time, I will try to write a figure that breaks the jade in an attempt to forcefully write something I do not understand.