Hello, this is Ninomiya of LIFULL CO., LTD.
In a machine learning project, after successful analysis and model accuracy evaluation, it must be successfully used in existing systems. At that time, it was difficult for our team to divide the roles of the engineers in charge of implementation.
Aiming for the state that "If a data scientist creates it in this format, it can be easily incorporated!", Wrap Amazon SageMaker and general purpose to some extent. We have prepared a development format and tools that can be used for various purposes.
Amazon SageMaker provides all developers and data scientists with the means to build, train, and deploy machine learning models. Amazon SageMaker is a fully managed service that covers the entire machine learning workflow. Label and prepare your data, select algorithms, train your model, tune and optimize for deployment, make predictions, and execute. You can put your model into production with less effort and cost.
As the main functions, if you prepare a Docker image that meets specific specifications, you can use the following functions.
Read the official docs and @ taniyam's (same team as me) article for specifications on preparing your own Docker image with SageMaker.
First, we asked data scientists to prepare the following directory structure.
.
├── README.md
├── Dockerfile
├── config.yml
├── pyproject.toml (poetry config file)
├── script
│ └── __init__.py
└── tests
└── __init__.py
The main process is written in script / __ init__.py
, and the script is as follows. This is the library prepared by simple_sagemaker_manager
.
import pandas as pd
from typing import List
from pathlib import Path
from sklearn import tree
from simple_sagemaker_manager.image_utils import AbstractModel
def train(training_path: Path) -> AbstractModel:
"""Do learning.
Args:
training_path (Path):Directory with csv files
Returns:
Model:Model object that inherits AbstractModel
"""
train_data = pd.concat([pd.read_csv(fname, header=None) for fname in training_path.iterdir()])
train_y = train_data.iloc[:, 0]
train_X = train_data.iloc[:, 1:]
# Now use scikit-learn's decision tree classifier to train the model.
clf = tree.DecisionTreeClassifier(max_leaf_nodes=None)
clf = clf.fit(train_X, train_y)
return Model(clf)
class Model(AbstractModel):
"""The method of serialization is described in AbstractModel.
"""
def predict(self, matrix: List[List[float]]) -> List[List[str]]:
"""Inference processing.
Args:
matrix (List[List[float]]):Table data
Returns:
list:Inference result
"""
#The result returned here will be the response of the inference API.
return [[x] for x in self.model.predict(pd.DataFrame(matrix))]
ʻAbstractModel has the following definition, and the result of calling the
savemethod (the result serialized by pickle) is saved, and this is used as a model when executing the training batch (used by the SageMaker system). It will be saved in S3. Also, the serialization method can be switched by overriding
save and
load`.
import pickle
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class AbstractModel(ABC):
model: object
@classmethod
def load(cls, model_path):
#Save the model during the training batch
with open(model_path / 'model.pkl', 'rb') as f:
model = pickle.load(f)
return cls(model)
def save(self, model_path):
#Load the model during inference
with open(model_path / 'model.pkl', 'wb') as f:
pickle.dump(self.model, f)
@abstractmethod
def predict(self, json):
pass
I try to operate with cli by referring to projects such as Python's poetry. The development flow of Docker image of SageMaker is as follows.
smcli new project name
)smcli build
)smcli push
)Also, I made it possible to edit the Dockerfile because some machine learning libraries can only be installed with Anaconda, so I received a request that "I want you to replace it with other than the official image of Python3".
It's hard to run boto3
directly, so I've also prepared a wrapped library. There are a lot of operations, but in many projects we have three things we want to do: "learn the model" and "run an OR batch conversion job that sets up an inference API", so we have an interface that makes it easy to understand.
from simple_sagemaker_manager.executor import SageMakerExecutor
from simple_sagemaker_manager.executor.classes import TrainInstance, TrainSpotInstance, Image
client = SageMakerExecutor()
#When learning with a normal instance
model = client.execute_batch_training(
instance=TrainInstance(
instance_type='ml.m4.xlarge',
instance_count=1,
volume_size_in_gb=10,
max_run=100
),
image=Image(
name="decision-trees-sample",
uri="xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest"
),
input_path="s3://xxxxxxxxxx/DEMO-scikit-byo-iris",
output_path="s3://xxxxxxxxxx/output",
role="arn:aws:iam::xxxxxxxxxx"
)
#When learning with Spot Instances
model = client.execute_batch_training(
instance=TrainSpotInstance(
instance_type='ml.m4.xlarge',
instance_count=1,
volume_size_in_gb=10,
max_run=100,
max_wait=1000
),
image=Image(
name="decision-trees-sample",
uri="xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest"
),
input_path="s3://xxxxxxxxxx/DEMO-scikit-byo-iris",
output_path="s3://xxxxxxxxxxx/output",
role="arn:aws:iam::xxxxxxxxxxxxx"
)
The inference API is made as follows. The points I devised are as follows.
from simple_sagemaker_manager.executor import SageMakerExecutor
from simple_sagemaker_manager.executor.classes import EndpointInstance, Model
client = SageMakerExecutor()
#When deploying a specific model
#If you specify multiple models in models, a Pipeline model will be created and used.
client.deploy_endpoint(
instance=EndpointInstance(
instance_type='ml.m4.xlarge',
initial_count=1,
initial_variant_wright=1
),
models=[
Model(
name='decision-trees-sample-191028-111309-538454',
model_arn='arn:aws:sagemaker:ap-northeast-1:xxxxxxxxxx',
image_uri='xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest',
model_data_url='s3://xxxxxxxxxx/model.tar.gz'
)
],
name='sample-endpoint',
role="arn:aws:iam::xxxxxxxxxx"
)
# execute_batch_You can also pass the result of training
model = client.execute_batch_training(
#Arguments omitted
)
client.deploy_endpoint(
instance=EndpointInstance(
instance_type='ml.m4.xlarge',
initial_count=1,
initial_variant_wright=1
),
models=[model],
name='sample-endpoint',
role="arn:aws:iam::xxxxxxxxxx"
)
Names other than endpoints (learning batch jobs, etc.) are automatically added with the current time string to avoid duplication. However, only the endpoint has the behavior of "update if there is one with the same name" to improve convenience.
Also, although omitted, the batch conversion job method is implemented in the same way.
I implemented it like this, and now I am actually using it in the implementation of some projects. However, there are some issues that have not been implemented yet, and there are still other issues within the team.
Also, when you actually use it within the team, there are some parts that are not easy to use, so I will try to solve those problems and make the machine learning project more efficient.
Recommended Posts