Introduction and tips of mlflow.Tracking

What is this

The kaggle competition Google Cloud & NCAA® ML Competition 2020-NCAAW that I participated in in March 2020. As a result of introducing the tracking function of mlflow in 1-womens-tournament), it was easy to use, so I will post it as a memorandum. The description mainly describes how to introduce the tracking function of mlflow and the points that I stumbled upon when introducing it.

What is mlflow

mlflow is an open source platform that manages the life cycle of machine learning (preprocessing-> learning-> deploy), and has three main functions. --Tracking: Logging --Projects: Packaging --Models: Deployment support This time, I will mainly touch on how to introduce Tracking. Please refer to here for details of Projects and Models.

What is Tracking

Tracking is a function that logs each parameter, evaluation index and result, output file, etc. when building a machine learning model. In addition, if you put a project in git, you can manage the code version, but I thought that the story would expand to projects when it comes to introducing it, so I will omit it this time (Next time, I will touch on projects) I want to handle it when I do).

Introduction of mlfrow

mlflow install

mlflow can be installed with pip.

pip install mlflow

URI settings

Set the URI for logging (by default, it is created directly under the folder at runtime). Not only the local directory but also the database and HTTP server can be specified for the URI. The logging destination directory name must be mlruns (the reason will be explained later).

import mlflow

mlflow.set_tracking_uri('./hoge/mlruns/')

Creating an experiment

The experiment is created by the analyst for each task in the machine learning project (for example, features, machine learning method, parameter comparison, etc.).

#If experiment does not exist, it will be created.
mlflow.set_experiment('compare_max_depth')

Run

Let's actually log.

with mlflow.start_run():
    mlflow.log_param('param1', 1) #Parameters
    mlflow.log_metric('metric1', 0.1) #Score
    mlflow.log_artifact('./model.pickle') #Other models, data, etc.
mlflow.search_runs() #You can get the logging contents in the experiment

It logs parameters, scores, models, etc. Please refer to the Official Document for detailed specifications of each function.

Start local server

Move to the directory set by URI. At this time, make sure that the mlruns directory is under the control (if the mlruns directory does not exist, the mlruns directory will be created). Start the local server with mlflow ui.

$ cd ./hoge/
$ ls
mlruns

$ mlflow ui

When you open http://127.0.0.1:5000 on your browser, the following screen will be displayed. スクリーンショット 2020-03-15 14.36.33.png

It is also possible to compare each parameter. スクリーンショット 2020-03-15 14.37.56.png

Tips

Get experiment id

tracking = mlflow.tracking.MlflowClient()
experiment = tracking.get_experiment_by_name('hoge')
print(experiment.experiment_id)

Get the experiment name

#Method 1:Get experiment list
tracking.list_experiments()

#Method 2: 
tracking = mlflow.tracking.MlflowClient()
experimet = tracking.get_experiment('1') #pass the experiment id
print(experimet.name)

Delete experiment

tracking = mlflow.tracking.MlflowClient()
tracking.delete_experiment('1')

Get run id

with mlflow.start_run():
    run_id = mlflow.active_run().info.run_id

If you pass the acquired run_id to the parameter of start_run (), the log of the target run_id will be overwritten.

Delete run

tracking = mlflow.tracking.MlflowClient()
tracking.delete_run(run_id)

Logging with dict

#If you want to log multiple parameters at the same time, pass it with dict.
params = {
    'test1': 1,
    'test2': 2
         }
metrics = {
    'metric1': 0.1,
    'metric2': 0.2
         }

with mlflow.start_run():
    mlflow.log_params(params)
    mlflow.log_metrics(metrics)

download artifacts

tracking = mlflow.tracking.MlflowClient()
print(tracking.list_artifacts(run_id=run_id)) #Get a list of artifacts
[<FileInfo: file_size=23, is_dir=False, path='model.pickle'>]

tracking.download_artifacts(run_id=run_id, path='model.pickle', dst_path='./')

Recommended Posts

Introduction and tips of mlflow.Tracking
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Introduction and implementation of activation function
Introduction of Python
Introduction of scikit-optimize
Easy introduction of python3 series and OpenCV3
Introduction of cymel
Introduction of Python
Python and numpy tips
Introduction of ferenOS 1 (installation)
Introduction of Virtualenv wrapper
Notes and Tips on Vertical Joining of PySpark DataFrame
Overview and tips of seaborn with statistical data visualization
Mechanism of pyenv and virtualenv
Pre-processing and post-processing of pytest
[Introduction to cx_Oracle] (Part 4) Fetch and scroll of result set
Combination of recursion and generator
Combination of anyenv and direnv
Explanation and implementation of SocialFoceModel
Introduction of activities applying Python
Differentiation of sort and generalization of sort
Coexistence of pyenv and autojump
[Introduction to Scipy] Calculation of Lorenz curve and Gini coefficient ♬
Use and integration of "Shodan"
Problems of liars and honesty
Introduction of DataLiner ver.1.3 and how to use Union Append
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError
Source installation and installation of Python
[Introduction to Python] I compared the naming conventions of C # and Python.
[Introduction to Udemy Python3 + Application] 69. Import of absolute path and relative path
[Introduction to pytorch-lightning] Autoencoder of MNIST and Cifar10 made from scratch ♬
[Introduction to Udemy Python3 + Application] 12. Indexing and slicing of character strings
[Introduction to cx_Oracle] (Part 2) Basics of connecting and disconnecting to Oracle Database
Introduction of M5StickC (Temperature / Humidity measurement and MQTT transmission, UIFlow Python)
[Introduction to Data Scientists] Basics of Python ♬ Conditional branching and loops
[Introduction to Data Scientists] Basics of Python ♬ Functions and anonymous functions, etc.
Environment construction of python and opencv
Various of Tweepy. Ma ♡ and ♡ me ♡
Basic knowledge of Linux and basic commands
Order of arguments of RegularGridInterpolator and interp2d
The story of Python and the story of NaN
Explanation and implementation of ESIM algorithm
Danger of mixing! ndarray and matrix
Installation of SciPy and matplotlib (Python)
[Introduction to Python3 Day 1] Programming and Python
Significance of machine learning and mini-batch learning
Memorandum of saving and loading model
Introduction of data-driven controller design method
Explanation and implementation of simple perceptron
Calculation of homebrew class and existing class
Hadoop introduction and MapReduce with Python
Jupyter Notebook: 4 banal tips and tricks
This and that of python properties
Introduction of pipenv (also create requirements.txt)
Installation and easy usage of pytest
Introduction of ferenOS 3 (package update, installation)
Introduction of python drawing package pygal
Clash of Clans and image analysis (3)
Anomaly detection introduction and method summary
Features of symbolic and hard links
Coexistence of Python2 and 3 with CircleCI (1.0)