Are you creating a machine learning project with a different structure each time? Are you spending time worrying about what to place and where?
Good news for such people. Machine learning projects can be created with ** one command **. You can create a project like the one below in seconds.
** Directory structure **
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
There are several benefits to using such a standard project structure. New people can join quickly because they know what's where. Also, for me, when I review the project a few months later, I don't have to worry about where and what is, so I can work quickly.
This kind of project structure is as easy to create as when creating a Django or Rails project. This article will show you how to do that.
Getting started First of all, install ** Cookiecutter **, which is a library for creating a directory structure. After that, I will actually make a project.
First, let's talk about Cookiecutter.
Cookiecutter is ** a Python library for creating projects from project templates **. You can easily create a project from an existing project template by using Cookiecutter. This time we will use a template for machine learning, but you can choose the template according to the project you want to create.
You can install Cookiecutter using pip as follows:
$ pip install cookiecutter
After the installation is complete, let's actually create a project.
Hit the command using the Cookiecutter you installed to create a new project. At that time, it is necessary to specify an existing project template as an argument of the command. This time, specify Cookiecutter Data Science, which is a template for machine learning. Let's execute the following command.
$ cookiecutter https://github.com/drivendata/cookiecutter-data-science
After executing the above command, you will be asked for the project name and creator's name, so I will answer. Answer all the questions and you have a new project.
project_name [project_name]: machine-learning
repo_name [machine-learning]:
author_name [Your name (or your organization/company/team)]: Hironsan
description [A short description of the project.]: Machine learning project
Select open_source_license:
1 - MIT
2 - BSD
3 - Not open source
Choose from 1, 2, 3 [1]: 1
s3_bucket [[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')]:
Select python_interpreter:
1 - python
2 - python3
Choose from 1, 2 [1]: 2
Determining the project structure is a surprisingly time-consuming task. I hope this article will help you in creating your project.
I'm also tweeting information about machine learning and natural language processing in my account, so I'm looking forward to your follow-up. @Hironsan
References
Recommended Posts