When I tried to use Azure Machine Learning on a local machine, although there was official documentation and information, it was packed several times and it took a long time to execute, so I will make a note as a memorandum.
This time, I will summarize from the setup of Ubuntu that runs Azure ML to the execution of Azure ML.
Set up your Ubuntu 18.04 environment using the docker image on your Mac. I will omit the acquisition of the docker image and the execution part.
In addition, you need to create an Azure account and workspace to run Azure ML. I will omit the work.
apt-get update
apt-get upgrade
There are some things that the docker image is not enough, so refer to here (https://qiita.com/manabuishiirb/items/26de8c9740a1d2c7cfdd) and install the necessary ones.
apt-get install -y iputils-ping net-tools wget curl vim build-essential
This time I will install it with a command, and download Anaconda as follows by referring to this (https://www.virment.com/setup-anaconda-python-jupyter-ubuntu/).
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
Install as follows.
bash Anaconda3-2019.10-Linux-x86_64.sh
Run conda init
to enable the conda
command.
Now that you have installed it in / root /
, run the following command:
/root/anaconda3/bin/conda init
source /root/.bashrc
Install azure-ml by referring to the official document (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-environment#local). First, create an Anaconda virtual environment.
conda create -n myenv python=3.6.5
conda activate myenv
conda install notebook ipykernel
ipython kernel install --user --name myenv --display-name "Python (myenv)"
Next, install the Azure CLI required for authentication etc. I referred to here (https://docs.microsoft.com/ja-jp/cli/azure/install-azure-cli-apt?view=azure-cli-latest).
curl -sL https://aka.ms/InstallAzureCLIDeb | bash
Finally, install the Azure ML SDK.
pip install azureml-sdk[notebooks,automl]
The following error appears on the way, but there was no problem.
ERROR: azureml-automl-runtime 1.0.81 has requirement azureml-automl-core==1.0.81, but you'll have azureml-automl-core 1.0.81.1 which is incompatible.
First, authenticate with the ʻaz login` command. Access the URL that appears after executing the command with a web browser and enter the code.
az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code GPVMUVTKF to authenticate.
Create a Python program (auth.py) to create workspace information.
auth.py
from azureml.core import Workspace
subscription_id = '<Subscription id>'
resource_group = '<Resource group name>'
workspace_name = '<Workspace name>'
try:
ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
ws.write_config()
print('Library configuration succeeded')
except:
print('Workspace not found')
When executed, a config file for connecting to the workspace will be created in .azureml / config.json
in the current directory.
Create a Python program (run.py) to perform machine learning. For the data, we will use the breast cancer data provided by scikit-learn. For more information on datasets, see here (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer).
run.py
import logging
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
#Workspace config read
ws = Workspace.from_config()
#Data loading
data = load_breast_cancer()
df_X = pd.DataFrame(data.data, columns=data.feature_names)
df_y = pd.DataFrame(data.target, columns=['target'])
x_train, x_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=100)
#Machine learning settings
automl_settings = {
"iteration_timeout_minutes": 2,
"experiment_timeout_minutes": 20,
"enable_early_stopping": True,
"primary_metric": 'AUC_weighted',
"featurization": 'auto',
"verbosity": logging.INFO,
"n_cross_validations": 5
}
automl_config = AutoMLConfig(task='classification',
debug_log='automated_ml_errors.log',
X=x_train.values,
y=y_train.values.flatten(),
**automl_settings)
#Run
experiment = Experiment(ws, "my-experiment")
local_run = experiment.submit(automl_config, show_output=True)
The part set in ʻautoml_settingsis described according to the data and the problem. Since this is a binary classification problem, the optimization index is set to AUC, and classification is set to
task of ʻAutoMLConfig
.
Click here for details (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-auto-train).
When executed, it will build some models and ensemble after a simple feature engineering.
python run.py
(abridgement)
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
****************************************************************************************************
DATA GUARDRAILS:
TYPE: Class balancing detection
STATUS: PASSED
DESCRIPTION: Classes are balanced in the training data.
TYPE: Missing values imputation
STATUS: PASSED
DESCRIPTION: There were no missing values found in the training data.
TYPE: High cardinality feature detection
STATUS: PASSED
DESCRIPTION: Your inputs were analyzed, and no high cardinality features were detected.
****************************************************************************************************
Current status: ModelSelection. Beginning model selection.
****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************
ITERATION PIPELINE DURATION METRIC BEST
0 StandardScalerWrapper SGD 0:00:13 0.9940 0.9940
1 StandardScalerWrapper SGD 0:00:12 0.9958 0.9958
2 MinMaxScaler LightGBM 0:00:12 0.9888 0.9958
3 StandardScalerWrapper SGD 0:00:11 0.9936 0.9958
4 StandardScalerWrapper ExtremeRandomTrees 0:00:14 0.9908 0.9958
5 StandardScalerWrapper LightGBM 0:00:11 0.9887 0.9958
6 StandardScalerWrapper SGD 0:00:11 0.9956 0.9958
7 MinMaxScaler RandomForest 0:00:13 0.9814 0.9958
8 StandardScalerWrapper SGD 0:00:11 0.9851 0.9958
9 MinMaxScaler SGD 0:00:11 0.9441 0.9958
10 MinMaxScaler RandomForest 0:00:11 0.9802 0.9958
11 MaxAbsScaler LightGBM 0:00:11 0.9780 0.9958
12 MinMaxScaler LightGBM 0:00:12 0.9886 0.9958
13 MinMaxScaler ExtremeRandomTrees 0:00:11 0.9816 0.9958
14 MinMaxScaler LightGBM 0:00:11 0.9731 0.9958
15 StandardScalerWrapper BernoulliNaiveBayes 0:00:11 0.9705 0.9958
16 StandardScalerWrapper LogisticRegression 0:00:13 0.9959 0.9959
17 MaxAbsScaler ExtremeRandomTrees 0:00:28 0.9906 0.9959
18 RobustScaler LogisticRegression 0:00:13 0.9853 0.9959
19 RobustScaler LightGBM 0:00:12 0.9904 0.9959
20 StandardScalerWrapper LogisticRegression 0:00:11 0.5000 0.9959
21 MaxAbsScaler LinearSVM 0:00:12 0.9871 0.9959
22 StandardScalerWrapper SVM 0:00:12 0.9873 0.9959
23 RobustScaler LogisticRegression 0:00:14 0.9909 0.9959
24 MaxAbsScaler LightGBM 0:00:15 0.9901 0.9959
25 RobustScaler LogisticRegression 0:00:29 0.9894 0.9959
26 MaxAbsScaler LightGBM 0:00:13 0.9897 0.9959
27 MaxAbsScaler LightGBM 0:00:15 0.9907 0.9959
28 RobustScaler KNN 0:00:12 0.9887 0.9959
29 MaxAbsScaler LogisticRegression 0:00:13 0.9940 0.9959
30 VotingEnsemble 0:00:31 0.9965 0.9965
31 StackEnsemble 0:00:36 0.9960 0.9965
Stopping criteria reached at iteration 31. Ending experiment.
Since the AUC is quite high at 0.99, it seems that something is leaking, but this time I will ignore it once.
I summarized the flow for running Azure ML in the local environment. While I think Azure ML is convenient, I wish the official Azure documentation was a little easier to understand ...
Recommended Posts