The Data Scientist Association has released ** "Data Science 100 Knock (Structured Data Processing)" **, a free learning environment where you can practically learn how to process structured data [on GitHub](https: / /github.com/The-Japan-DataScientist-Society/100knocks-preprocess). This article describes the details of the introduction procedure so that even beginners can build a free learning environment. (The execution environment to be built is shown in the figure below.)
> git config --global core.autocrlf input
Create a directory for the learning environment (dss this time) and clone a repository of 100 knocks.
After that, move to the 100 knock directory and use the docker-compose
command to create a container. (It takes about 10 minutes.)
> mkdir dss
> cd dss
> git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess.git
> cd 100knocks-preprocess
> docker-compose up -d --build
If you can check the started container and check the output of ** "dss-notebook" ** and ** "dss-postgres" **, the environment construction is successful.
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b35f99d4148a dss-notebook "tini -g -- start-no…" 23 seconds ago Up 22 seconds 0.0.0.0:8888->8888/tcp dss-notebook
3cb559c7f66d dss-postgres "docker-entrypoint.s…" 27 seconds ago Up 26 seconds 0.0.0.0:5432->5432/tcp dss-postgres
You can access the built Jupyter environment by accessing the following URL with a browser.
http://localhost:8888
Under the work
directory, there is an .ipynb file for structured data processing exercises.
** Import of required library and data acquisition before processing are already described in the first cell. ** **
Enter the process suitable for the exercise in the blank cell and execute it to proceed with the learning.
The answer to the exercise is in the .ipynb file in the work / answer
directory.
Therefore, you can work while checking the correctness of the processing answered in the exercise file.
You can stop the built environment with the following command.
> docker-compose stop
Also, if you want to start it after the second time, you can start it with the following command.
> docker-compose start
Change the Memory value of Resources in Settings of Docker Desktop for Windows. The recommendation is 4.00GB or more.
If you are using the 8888 port of the local host in another development environment (LAMP etc.), you can handle it by changing docker-compose.yml as follows (change the port value of notebook).
docker-compose.yml
notebook:
ports:
- "888:8888"
In the above case, it will be accessible at the following URL.
http://localhost:888
Described the environment construction procedure for 100 data science knocks (structured data processing) in the Windows 10 environment. If you have any questions or concerns regarding the above procedure, we would appreciate it if you could comment.
Recommended Posts