What is this

The kaggle competition Google Cloud & NCAA® ML Competition 2020-NCAAW that I participated in in March 2020. As a result of introducing the tracking function of mlflow in 1-womens-tournament), it was easy to use, so I will post it as a memorandum. The description mainly describes how to introduce the tracking function of mlflow and the points that I stumbled upon when introducing it.

What is mlflow

mlflow is an open source platform that manages the life cycle of machine learning (preprocessing-> learning-> deploy), and has three main functions. --Tracking: Logging --Projects: Packaging --Models: Deployment support This time, I will mainly touch on how to introduce Tracking. Please refer to here for details of Projects and Models.

What is Tracking

Tracking is a function that logs each parameter, evaluation index and result, output file, etc. when building a machine learning model. In addition, if you put a project in git, you can manage the code version, but I thought that the story would expand to projects when it comes to introducing it, so I will omit it this time (Next time, I will touch on projects) I want to handle it when I do).

Introduction of mlfrow

mlflow install

mlflow can be installed with pip.

pip install mlflow

The version of mlflow at the time of writing this article is 1.5.0.

URI settings

Set the URI for logging (by default, it is created directly under the folder at runtime). Not only the local directory but also the database and HTTP server can be specified for the URI. The logging destination directory name must be mlruns (the reason will be explained later).

import mlflow

mlflow.set_tracking_uri('./hoge/mlruns/')

This time, we will manage it locally.

Creating an experiment

The experiment is created by the analyst for each task in the machine learning project (for example, features, machine learning method, parameter comparison, etc.).

#If experiment does not exist, it will be created.
mlflow.set_experiment('compare_max_depth')

Run

Let's actually log.

with mlflow.start_run():
    mlflow.log_param('param1', 1) #Parameters
    mlflow.log_metric('metric1', 0.1) #Score
    mlflow.log_artifact('./model.pickle') #Other models, data, etc.
mlflow.search_runs() #You can get the logging contents in the experiment

It logs parameters, scores, models, etc. Please refer to the Official Document for detailed specifications of each function.

Start local server

Move to the directory set by URI. At this time, make sure that the mlruns directory is under the control (if the mlruns directory does not exist, the mlruns directory will be created). Start the local server with mlflow ui.

$ cd ./hoge/
$ ls
mlruns

$ mlflow ui

When you open http://127.0.0.1:5000 on your browser, the following screen will be displayed. スクリーンショット 2020-03-15 14.36.33.png

It is also possible to compare each parameter. スクリーンショット 2020-03-15 14.37.56.png

Tips

Get experiment id

tracking = mlflow.tracking.MlflowClient()
experiment = tracking.get_experiment_by_name('hoge')
print(experiment.experiment_id)

Get the experiment name

#Method 1:Get experiment list
tracking.list_experiments()

#Method 2: 
tracking = mlflow.tracking.MlflowClient()
experimet = tracking.get_experiment('1') #pass the experiment id
print(experimet.name)

Delete experiment

tracking = mlflow.tracking.MlflowClient()
tracking.delete_experiment('1')

Get run id

with mlflow.start_run():
    run_id = mlflow.active_run().info.run_id

If you pass the acquired run_id to the parameter of start_run (), the log of the target run_id will be overwritten.

Delete run

tracking = mlflow.tracking.MlflowClient()
tracking.delete_run(run_id)

Logging with dict

#If you want to log multiple parameters at the same time, pass it with dict.
params = {
    'test1': 1,
    'test2': 2
         }
metrics = {
    'metric1': 0.1,
    'metric2': 0.2
         }

with mlflow.start_run():
    mlflow.log_params(params)
    mlflow.log_metrics(metrics)

download artifacts

tracking = mlflow.tracking.MlflowClient()
print(tracking.list_artifacts(run_id=run_id)) #Get a list of artifacts
[<FileInfo: file_size=23, is_dir=False, path='model.pickle'>]

tracking.download_artifacts(run_id=run_id, path='model.pickle', dst_path='./')

Introduction and tips of mlflow.Tracking