First, briefly introduce yourself. I started studying data science in May 2020.
・ It is the first time to touch the programming language itself until May 2020 ・ Since Excel is often used for work, it is a level that can handle simple functions.
When I was studying data science, I thought There are few places to practice data processing, which seems to be the most burdensome in practice! !! That is.
Meanwhile, around June, the Data Scientist Association uploaded the optimal issues on GitHub! Quote: General Incorporated Association Data Scientist Association Data Science 100 Knock (Structured Data Processing) https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess
As a first step, I would like to try this 100 knocks with Python, SQL, R without looking at the answer code. As mentioned above, since I am a genuine amateur when it comes to programming, there may be a lot of fucking code, but please take a warm look.
P-001: Display the first 10 items of all items from the data frame (df_receipt) of the receipt details, and visually check what kind of data you have.
In
df_receipt.head(10)
Output result:
P-002: Specify columns in the order of sales date (sales_ymd), customer ID (customer_id), product code (product_cd), and sales amount (amount) from the receipt statement data frame (df_receipt), and display 10 items.
In
df_clms = df_receipt[["sales_ymd", "customer_id", "product_cd", "amount"]]
df_clms.head(10)
Output result:
I will update it when I have time.
Recommended Posts