What is SageMaker Autopilot?

It automatically preprocesses, selects algorithms, and optimizes hyperparameters provided by AWS. It's AutoML that runs on SageMaker. This time, I have a sample of Autopilot, so I would like to actually move it. → Autopilot sample

Try moving the sample

First, create the necessary libraries and Sessions.

`jupyter`


import sagemaker
import boto3
from sagemaker import get_execution_role

region = boto3.Session().region_name

session = sagemaker.Session()
bucket = session.default_bucket()
prefix = 'sagemaker/autopilot-dm'

role = get_execution_role()

sm = boto3.Session().client(service_name='sagemaker',region_name=region)

Next, download the dataset. The data we are using this time is Bank Marketing Data Set. It's the data of the bank's direct marketing, and it seems to be the data of whether to execute the time deposit.

`jupyter`


!wget -N https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip
!unzip -o bank-additional.zip

local_data_path = './bank-additional/bank-additional-full.csv'

Next, divide the downloaded data into test data and train data, and delete the "y" column, which is the objective variable.

`jupyter`


import pandas as pd

data = pd.read_csv(local_data_path, sep=';')
train_data = data.sample(frac=0.8,random_state=200)

test_data = data.drop(train_data.index)

test_data_no_target = test_data.drop(columns=['y'])

After that, upload each divided data to S3.

`jupyter`


train_file = 'train_data.csv';
train_data.to_csv(train_file, index=False, header=True)
train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + "/train")
print('Train data uploaded to: ' + train_data_s3_path)

test_file = 'test_data.csv';
test_data_no_target.to_csv(test_file, index=False, header=False)
test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + "/test")
print('Test data uploaded to: ' + test_data_s3_path)

Next, we will set up Autopilot. In this sample, the settings are as follows, but it seems that various other settings can be made. The settings are described in this document, so please check it. please try.

`jupyter`



input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'y'
    }
  ]

output_data_config = {
    'S3OutputPath': 's3://{}/{}/output'.format(bucket,prefix)
  }

Now that the settings are complete, let's actually move it.

`jupyter`


from time import gmtime, strftime, sleep
timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())

auto_ml_job_name = 'automl-banking-' + timestamp_suffix
print('AutoMLJobName: ' + auto_ml_job_name)

sm.create_auto_ml_job(AutoMLJobName=auto_ml_job_name,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      RoleArn=role)

By writing the following, the content that is being executed every 30 seconds will be output.

`jupyter`


print ('JobStatus - Secondary Status')
print('------------------------------')


describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
job_run_status = describe_response['AutoMLJobStatus']
    
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
    job_run_status = describe_response['AutoMLJobStatus']
    
    print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
    sleep(30)

Model creation is complete when the output is "Completed". I think it took a little over two hours.

Summary

This time, I tried to automatically create a model using SageMaker Autopilot. I realized once again that AutoML is amazing because you can create a model just by preparing the data. I hope this will reduce the difficulty of creating a model and make ML widely used.

I tried to create a model with the sample of Amazon SageMaker Autopilot

What is SageMaker Autopilot?

Try moving the sample

jupyter

jupyter

jupyter

jupyter

jupyter

jupyter

jupyter

Summary

`jupyter`

`jupyter`

`jupyter`

`jupyter`

`jupyter`

`jupyter`

`jupyter`