The Data Scientist Association https://www.datascientist.or.jp/ has released "Data Science 100 Knock (Structured Data Processing)". Record the work to try the Python version of Jupyter Notebook on a PC with Windows 10 Home Edition, Anaconda (2020.02) installed, without installing Docker.
It will be a collection of questions with data introduced below (3 languages, with answers). https://digitalpr.jp/r/39499
My environment -PC is Windows 10 Home. Although it is supported by Docker Toolbox. -Memory is 8GB. -Since Anaconda has been installed, Python and Jupyter Notebook work. First of all, I thought about trying what would happen if I didn't install Docker.
Go to the top page of the project. For example, the top page of this "Data Science 100 Knock" is below. https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess
Now click "Green Code" on the right.
Then click Download ZIP.
PC Download 100knocks-preprocess-master.zip is downloaded to Florda.
When you unzip the zip, the contents are as follows.
All I need is a code with the question, a set of data, and an answer code.
For example, the code for Jupyter Notebook is below.
For example, the data is below.
Move the entire folder under MyPython (the folder that contains the Python code).
Folder Go to MyPython → 100knocks-preprocess-master → docker → work.
Click preprocess_knock_Python.ipynb to open it.
Click the first Inbox to run Run.
When you confirm import It gets stuck with psycopg2.
If you look closely, some libraries are not installed. Think here. (1) Do you install these (there is a question that you may never use them?). (2) Do you define the dataframe yourself (I have a feeling that I will be in trouble later if I can not operate from the csv data).
Import is used as it is (libraries that are not installed are excluded for the time being), and geocode.csv has blank data, so I defined the type.
import os
import pandas as pd
import numpy as np
from datetime import datetime, date
from dateutil.relativedelta import relativedelta
import math
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
df_customer = pd.read_csv('data/customer.csv')
df_category = pd.read_csv('data/category.csv')
df_product = pd.read_csv('data/product.csv')
df_receipt = pd.read_csv('data/receipt.csv')
df_store = pd.read_csv('data/store.csv')
df_geocode = pd.read_csv('data/geocode.csv',\
converters={'prefecture':str,'city':str,'town':str,'street':str,'address':str})
Exercises can be carried out in their own way.
Practical learning environment for data science beginners "Data Science 100 Knock (Structured Data Processing)" is released for free on GitHub: https://digitalpr.jp/r/39499
Recommended Posts