Introduction

When I tried to use Azure Machine Learning on a local machine, although there was official documentation and information, it was packed several times and it took a long time to execute, so I will make a note as a memorandum.

This time, I will summarize from the setup of Ubuntu that runs Azure ML to the execution of Azure ML.

Preparing the environment

Set up your Ubuntu 18.04 environment using the docker image on your Mac. I will omit the acquisition of the docker image and the execution part.

In addition, you need to create an Azure account and workspace to run Azure ML. I will omit the work.

Ubuntu image setup

apt-get update
apt-get upgrade

There are some things that the docker image is not enough, so refer to here (https://qiita.com/manabuishiirb/items/26de8c9740a1d2c7cfdd) and install the necessary ones.

apt-get install -y iputils-ping net-tools wget curl vim build-essential

Installation of Anaconda

This time I will install it with a command, and download Anaconda as follows by referring to this (https://www.virment.com/setup-anaconda-python-jupyter-ubuntu/).

wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh

Install as follows.


bash Anaconda3-2019.10-Linux-x86_64.sh

Run conda init to enable the conda command. Now that you have installed it in / root /, run the following command:

/root/anaconda3/bin/conda init
source /root/.bashrc

Install Azure Python SDK

Install azure-ml by referring to the official document (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-environment#local). First, create an Anaconda virtual environment.

conda create -n myenv python=3.6.5
conda activate myenv
conda install notebook ipykernel
ipython kernel install --user --name myenv --display-name "Python (myenv)"

Next, install the Azure CLI required for authentication etc. I referred to here (https://docs.microsoft.com/ja-jp/cli/azure/install-azure-cli-apt?view=azure-cli-latest).

curl -sL https://aka.ms/InstallAzureCLIDeb | bash

Finally, install the Azure ML SDK.

pip install azureml-sdk[notebooks,automl]

The following error appears on the way, but there was no problem.

ERROR: azureml-automl-runtime 1.0.81 has requirement azureml-automl-core==1.0.81, but you'll have azureml-automl-core 1.0.81.1 which is incompatible.

Performing automatic machine learning

Authentication by ʻaz login`

First, authenticate with the ʻaz login` command. Access the URL that appears after executing the command with a web browser and enter the code.

az login

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code GPVMUVTKF to authenticate.

Creating a file to connect to the workspace

Create a Python program (auth.py) to create workspace information.

`auth.py`


from azureml.core import Workspace

subscription_id = '<Subscription id>'
resource_group  = '<Resource group name>'
workspace_name  = '<Workspace name>'

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

When executed, a config file for connecting to the workspace will be created in .azureml / config.json in the current directory.

Run

Create a Python program (run.py) to perform machine learning. For the data, we will use the breast cancer data provided by scikit-learn. For more information on datasets, see here (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer).

`run.py`


import logging

from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

#Workspace config read
ws = Workspace.from_config()

#Data loading
data = load_breast_cancer()
df_X = pd.DataFrame(data.data, columns=data.feature_names)
df_y = pd.DataFrame(data.target, columns=['target'])
x_train, x_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=100)

#Machine learning settings
automl_settings = {
    "iteration_timeout_minutes": 2,
    "experiment_timeout_minutes": 20,
    "enable_early_stopping": True,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "n_cross_validations": 5
}


automl_config = AutoMLConfig(task='classification',
                             debug_log='automated_ml_errors.log',
                             X=x_train.values,
                             y=y_train.values.flatten(),
                             **automl_settings)

#Run
experiment = Experiment(ws, "my-experiment")
local_run = experiment.submit(automl_config, show_output=True)

The part set in ʻautoml_settingsis described according to the data and the problem. Since this is a binary classification problem, the optimization index is set to AUC, and classification is set totask of ʻAutoMLConfig. Click here for details (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-auto-train).

When executed, it will build some models and ensemble after a simple feature engineering.

python run.py 

(abridgement)

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing values imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.

****************************************************************************************************
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   StandardScalerWrapper SGD                      0:00:13       0.9940    0.9940
         1   StandardScalerWrapper SGD                      0:00:12       0.9958    0.9958
         2   MinMaxScaler LightGBM                          0:00:12       0.9888    0.9958
         3   StandardScalerWrapper SGD                      0:00:11       0.9936    0.9958
         4   StandardScalerWrapper ExtremeRandomTrees       0:00:14       0.9908    0.9958
         5   StandardScalerWrapper LightGBM                 0:00:11       0.9887    0.9958
         6   StandardScalerWrapper SGD                      0:00:11       0.9956    0.9958
         7   MinMaxScaler RandomForest                      0:00:13       0.9814    0.9958
         8   StandardScalerWrapper SGD                      0:00:11       0.9851    0.9958
         9   MinMaxScaler SGD                               0:00:11       0.9441    0.9958
        10   MinMaxScaler RandomForest                      0:00:11       0.9802    0.9958
        11   MaxAbsScaler LightGBM                          0:00:11       0.9780    0.9958
        12   MinMaxScaler LightGBM                          0:00:12       0.9886    0.9958
        13   MinMaxScaler ExtremeRandomTrees                0:00:11       0.9816    0.9958
        14   MinMaxScaler LightGBM                          0:00:11       0.9731    0.9958
        15   StandardScalerWrapper BernoulliNaiveBayes      0:00:11       0.9705    0.9958
        16   StandardScalerWrapper LogisticRegression       0:00:13       0.9959    0.9959
        17   MaxAbsScaler ExtremeRandomTrees                0:00:28       0.9906    0.9959
        18   RobustScaler LogisticRegression                0:00:13       0.9853    0.9959
        19   RobustScaler LightGBM                          0:00:12       0.9904    0.9959
        20   StandardScalerWrapper LogisticRegression       0:00:11       0.5000    0.9959
        21   MaxAbsScaler LinearSVM                         0:00:12       0.9871    0.9959
        22   StandardScalerWrapper SVM                      0:00:12       0.9873    0.9959
        23   RobustScaler LogisticRegression                0:00:14       0.9909    0.9959
        24   MaxAbsScaler LightGBM                          0:00:15       0.9901    0.9959
        25   RobustScaler LogisticRegression                0:00:29       0.9894    0.9959
        26   MaxAbsScaler LightGBM                          0:00:13       0.9897    0.9959
        27   MaxAbsScaler LightGBM                          0:00:15       0.9907    0.9959
        28   RobustScaler KNN                               0:00:12       0.9887    0.9959
        29   MaxAbsScaler LogisticRegression                0:00:13       0.9940    0.9959
        30   VotingEnsemble                                 0:00:31       0.9965    0.9965
        31   StackEnsemble                                  0:00:36       0.9960    0.9965
Stopping criteria reached at iteration 31. Ending experiment.

Since the AUC is quite high at 0.99, it seems that something is leaking, but this time I will ignore it once.

Summary

I summarized the flow for running Azure ML in the local environment. While I think Azure ML is convenient, I wish the official Azure documentation was a little easier to understand ...

Notes on running Azure Machine Learning locally