This is a struggle record of knocking 100 eggs (freshly laid) of a data scientist without knowing why. It's a mystery even if you can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

Page used as a reference for building the environment last time

** Be careful if you are trying to do it as it includes spoilers **

There were many writing styles I didn't know, and there were many saying "I wrote this, but the answer was this", so I'll put it in place of a memo.

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This solution is wrong! This interpretation is different! Please comment if you have any.

table of contents	problem
part1	１～９
part2	１０～１８
part3	１９～２２
part4	２３～２８
part5	２９～３２
part6	３３～３５
[part7]	Not posted

1st

As expected this can be written. Even if you don't get into your head even if you prepare for it, it will hinder you if you can't write this.

`mine01.py`


df_receipt.head(10)

2nd

~~ Suddenly get an error ~~ It was. When projecting multiple columns, I wrote [['column A','column B']]. It is a daily occurrence to throw an error.

`mine02.py`


df_receipt[['sales_ymd','customer_id','product_cd','amount']].head(10)

3rd

Give an answer that is different from the answer as soon as possible. I wrote

`mine03.py`


df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df.columns=['sales_date','customer_id','product_cd','amount']
df.head(10)

It may seem stupid to use three lines for such a simple thing, but it happened because I was solving it while organizing my mind. Or rather, I haven't used {} or python since I started doing it, so I simply didn't know rename.

** Model answer ** df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].rename(columns={'sales_ymd': 'sales_date'}).head(10)

4th

I gave a different answer here as well.

`mine04.py`


df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df

Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']].query('customer_id == "CS018205000001"')

Am I the only one who feels resistance to the part where query is written as a string? ~~ Character string …… WAF …… Regular expression matching …… The head is ~~ With character string type input, even if you make a typo internally, you will not know the error. Is query faster?

5th

`mine05.py`


#df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
#df=df[df['customer_id']=='CS018205000001']
df[df['amount']>=1000]

Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & amount >= 1000')

Since the preconditions were the same as the 4th one, I used df as it is. Again, the model answer is query. I thought while writing, but is it a deprecated writing style like ʻix`?

Digression

df=df[df['customer_id']=='CS018205000001'] df[df['amount']>=1000] Connect the two lines of df=df[df['customer_id']=='CS018205000001'][df['amount']>=1000]

Will give the same result, but /opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index. I will give a warning. Since df = df [condition] becomes df, I was angry when I thought that df [condition 1] [condition 2] could also be done.

6th

`mine06.py`


df=df_receipt[['sales_ymd','customer_id','product_cd','quantity','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[(1000<=df['amount'])|(5<=df['quantity'])]
df

Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'quantity', 'amount']].query('customer_id == "CS018205000001" & (amount >= 1000 | quantity >=5)')

Since the conditions are getting longer, it is further divided. Especially, I think it is easier to understand if the AND condition is divided. Will the response go down?

7th

`mine07.py`


df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[(1000<=df['amount'])&(df['amount']<=2000)]
df

Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & 1000 <= amount <= 2000')

It is attractive to be able to write the Between condition in one word. By the way, from this point on, I just thought, "Write in SQL ...".

8th

`mine08.py`


df=df_receipt[['sales_ymd','customer_id','product_cd','amount']]
df=df[df['customer_id']=='CS018205000001']
df=df[df['product_cd'] != 'P071401019']
df

Model answer df_receipt[['sales_ymd', 'customer_id', 'product_cd', 'amount']] \ .query('customer_id == "CS018205000001" & product_cd != "P071401019"')

I'm wondering if I should conclude the conditions inside under such conditions.

9th

`mine09.py`


df_store.query('prefecture_cd != "13" & not (floor_area > 900)')

Model answer df_store.query('prefecture_cd != "13" & floor_area <= 900')

Finally give in to query ~~ Not because it's a rewrite problem. Not sure if it was necessary to use not. Or rather, no.

Up to here for this time

To be honest, the only thing I couldn't find out was here. From the next time, I will try to write a figure that breaks the jade in an attempt to forcefully write something I do not understand.

Data Science 100 Knock ~ Battle for less than beginners part1

1st

mine01.py

2nd

mine02.py

3rd

mine03.py

4th

mine04.py

5th

mine05.py