The kaggle competition Google Cloud & NCAA® ML Competition 2020-NCAAW that I participated in in March 2020. As a result of introducing the tracking function of mlflow in 1-womens-tournament), it was easy to use, so I will post it as a memorandum. The description mainly describes how to introduce the tracking function of mlflow and the points that I stumbled upon when introducing it.
mlflow is an open source platform that manages the life cycle of machine learning (preprocessing-> learning-> deploy), and has three main functions. --Tracking: Logging --Projects: Packaging --Models: Deployment support This time, I will mainly touch on how to introduce Tracking. Please refer to here for details of Projects and Models.
Tracking is a function that logs each parameter, evaluation index and result, output file, etc. when building a machine learning model. In addition, if you put a project in git, you can manage the code version, but I thought that the story would expand to projects when it comes to introducing it, so I will omit it this time (Next time, I will touch on projects) I want to handle it when I do).
mlflow can be installed with pip.
pip install mlflow
Set the URI for logging (by default, it is created directly under the folder at runtime).
Not only the local directory but also the database and HTTP server can be specified for the URI.
The logging destination directory name must be mlruns
(the reason will be explained later).
import mlflow
mlflow.set_tracking_uri('./hoge/mlruns/')
The experiment is created by the analyst for each task in the machine learning project (for example, features, machine learning method, parameter comparison, etc.).
#If experiment does not exist, it will be created.
mlflow.set_experiment('compare_max_depth')
Let's actually log.
with mlflow.start_run():
mlflow.log_param('param1', 1) #Parameters
mlflow.log_metric('metric1', 0.1) #Score
mlflow.log_artifact('./model.pickle') #Other models, data, etc.
mlflow.search_runs() #You can get the logging contents in the experiment
It logs parameters, scores, models, etc. Please refer to the Official Document for detailed specifications of each function.
Move to the directory set by URI. At this time, make sure that the mlruns
directory is under the control (if the mlruns
directory does not exist, the mlruns
directory will be created).
Start the local server with mlflow ui
.
$ cd ./hoge/
$ ls
mlruns
$ mlflow ui
When you open http://127.0.0.1:5000
on your browser, the following screen will be displayed.
It is also possible to compare each parameter.
Tips
tracking = mlflow.tracking.MlflowClient()
experiment = tracking.get_experiment_by_name('hoge')
print(experiment.experiment_id)
#Method 1:Get experiment list
tracking.list_experiments()
#Method 2:
tracking = mlflow.tracking.MlflowClient()
experimet = tracking.get_experiment('1') #pass the experiment id
print(experimet.name)
tracking = mlflow.tracking.MlflowClient()
tracking.delete_experiment('1')
with mlflow.start_run():
run_id = mlflow.active_run().info.run_id
If you pass the acquired run_id to the parameter of start_run ()
, the log of the target run_id will be overwritten.
tracking = mlflow.tracking.MlflowClient()
tracking.delete_run(run_id)
#If you want to log multiple parameters at the same time, pass it with dict.
params = {
'test1': 1,
'test2': 2
}
metrics = {
'metric1': 0.1,
'metric2': 0.2
}
with mlflow.start_run():
mlflow.log_params(params)
mlflow.log_metrics(metrics)
download artifacts
tracking = mlflow.tracking.MlflowClient()
print(tracking.list_artifacts(run_id=run_id)) #Get a list of artifacts
[<FileInfo: file_size=23, is_dir=False, path='model.pickle'>]
tracking.download_artifacts(run_id=run_id, path='model.pickle', dst_path='./')
Recommended Posts