Run azure ML on jupyter notebook

Premise

・ Run azure machine learning experiment with jupyter notebook ・ Creation of azure workspace is omitted. -Operating environment is macOS

procedure

1. Launch jupyter notebook

1-1. Create virtual environment conda -n create virtual environment name 1-2. Activate the virtual environment conda activate virtual environment name 1-3. Start upyter notebook jupyter notebook

2. Preparation

2-1. Install the Azure ml package pip install azureml,pip install sklearn 2-2. Import Azure workspace At this time, download the config file in the Azure workspace and store it in the same folder as the ipynob file.

from azureml.core.workspace import Workspace
ws = Workspace.from_config()

%matplotlib inline
import matplotlib.pyplot as plt
import sklearn
from sklearn import preprocessing, metrics, model_selection
from sklearn.preprocessing import MinMaxScaler, StandardScaler, LabelEncoder, OneHotEncoder, LabelBinarizer 
from sklearn.model_selection import KFold, StratifiedKFold, GridSearchCV, train_test_split
from datetime import datetime, date, timezone, timedelta
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os, gc

3. Read data

PATH = '/Users/〇〇/Desktop/jupyter/〇〇/'

##File reading
data = pd.read_csv(PATH+'〇〇.csv')

##Divided into learning, validation and test data
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, test_size=0.4 ,shuffle=False)
train_data, validation_data = train_test_split(train_data, train_size=0.66 ,shuffle=False)

##Delete unnecessary data
no_label = "CO"
train_data = train_data.drop(no_label,axis=1)
test_data = test_data.drop(no_label,axis=1)
validation_data = validation_data.drop(no_label,axis=1)

##Select target column
label =  "NOX"
 
test_labels = test_data.pop(label).values

4. Model creation

Shows the model creation program. This time I wanted to make the task a regression, so specify task ='regression' ('Classification' for classification tasks,'forecasting' for time series analysis) In addition, it is attractive that you can change settings that cannot be changed with the Azure GUI. By the way, I wanted to specify data division, so I am running it from jupyter notebook. (By default, cross-validation of the number of divisions according to the number of rows)

from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(task='regression',
                             primary_metric='r2_score',
                             experiment_timeout_minutes=60,
                             training_data=train_data,
                             label_column_name=label,
                             validation_data = validation_data,
                             debug_log='automated_ml_errors.log')

Start of experiment (model creation and verification)

from azureml.core.experiment import Experiment
experiment = Experiment(ws, "〇〇")
local_run = experiment.submit(automl_config, show_output=True)

Please enter any experiment name in 〇〇

Finally

Since there are few sites that can be referenced in the settings when creating a model and I have spent time, I wrote an article with code as an example so that those who want to implement the same thing can do it immediately. I also had a little trouble with library versioning and passing the python import path. There are still many things I don't understand as an engineer, but I will study little by little. ^ ^

Reference site

microsoft documentation ・ Https://docs.microsoft.com/ja-jp/azure/machine-learning/how-to-configure-cross-validation-data-splits ・ Https://docs.microsoft.com/ja-jp/azure/machine-learning/how-to-auto-train-forecast

Blog article ・ Https://www.simpletraveler.jp/2019/12/08/tried-azuremachinelearning-on-local-jupyter/