It will be updated at any time as it is underway.

*update

(2016/12/12)
(2017/1/14)*

In January 2017, the prototype implementation was completed. I talked about this implementation at AWS Premier Night # 3, so please refer to [here](http: http: Please see the article at //blog.serverworks.co.jp/tech/2017/01/13/awspremier3-starting-serverless-ml/).

Basically, it's an article that I just want to do some recent delicious food. An attempt to get easy operation and machine learning at the same time by putting ML logic on as a Service.

In this article, I will describe here the examination to put the logic of "time series analysis" on the serverless infrastructure of FaaS system or managed infrastructure such as PaaS if possible. Please note that the details of the analysis method will not be discussed.

goal

For the time being, the goal is if the place where the analysis processing implementation moves is (probably) easier to operate than the conventional physical / virtual server. The basic line is around FaaS and PaaS.

Introduction

The analysis I am trying to do is "time series analysis". In the context of infrastructure monitoring, outlier detection / change point detection is applicable. Recently, you have Datadog implemented as a beta version feature. I think it's very nice. => http://docs.datadoghq.com/ja/guides/anomalies/

Outlier detection / change point detection is based on past data to determine whether ** current data is ** a value that should be noted in view of past trends. It is a method for determining the current value, and it is not possible to predict the future. On the other hand, the method I use this time is "future value prediction logic" using methods such as _AR (Auto Regression) _ and _MA (Moving Average) _.

The forecast target is "monthly sales amount". Import the aggregated monthly sales performance from CRM (Salesforce in my case) and use the time series analysis framework to forecast the most recent sales for N months.

(Reference) Inspired the Next

ML as a Service workshop. A spin-off event of ServerlessConf. => [Special Project] Serverless Machine Learning Workshop => SlideShare

Premise

The language used is basically Python. The library around machine learning is extensive, the environment that supports Python is large, and I like it.

In this implementation, the following libraries are used.

numpy
scipy
pandas
statsmodels

Target of consideration

I have listed the candidates that I can think of. There are many things that I haven't considered and have never used. I would like you to point out any influential information.

First of all

From the neighborhood of "machine learning x cloud". After all these three are the major points.

Amazon Machine Learning
Azure Machine Learning
GCP Cloud Machine Learning

Amazon ML does not support time series analysis, so it is out. I was expecting a release on re: Invent, but Direction of Another You did evolution in / rekognition /). It was a very hot announcement, but it seems that the direction is different from what I expected this time.

Azure time series analysis only supports anomaly detection, so it is not supported as it is. However, you can embed Python / R code, and you can use it to implement anything. The execution environment supports Anaconda, and there is no problem with library support.

GCP machine learning hasn't been investigated yet. Since there is a description that it is based on TensorFlow, I think that the answer is probably "I can implement anything", but I am still lacking in understanding including peripheral services, so it is in an unexamined state at this time.

Runner-up

A service called Function as a Service. There are also three majors below.

AWS Lambda
Azure Functions
GCP Cloud Funtions

Azure / GCP is still under-researched. For Lambda, it works by bundling Python packages (and shared libraries) that are not included in the Lambda runtime environment. As a method, I refer to this (nice) hack. The difference is that statsmodels is required, but it fits within the size limit (50MB) of Labmda's bundle package. => Using Scikit-Learn in AWS Lambda -- Serverless Code

I wonder if other FaaS basically have the same binding. As for Azure Functions, the execution platform is based on App Service, so you may be able to solve the problem by using App Service Editor. Official document "Develop Node.js apps with App Service Editor ", it seems that you can also add packages by npm, so even in Python The rationale is that it would be the same reasoning. It's still unverified, but if ↑ is true, you're likely to be a big fan of Azure.

Further runner-up

PaaS and managed container environment. Since "I want to do serverless" is one of the starting points, I am basically reluctant to consider hiring.

PaaS
- Heroku
- Google App Engine
- IBM Bluemix
- Azure App Service
- AWS Beanstalk
Managed Container Service
- Amazon ECS
- GCP Container Engine
- Azure Container Service --Docker Cloud (formerly Tutum)

Anaconda has all the libraries I need. A PaaS that supports Anaconda in the execution environment would be an attractive option.

Heroku would like to have Anaconda in the execution environment, but it doesn't seem to be done. After investigating the Buildpack, it seems that there is a Conda Build Pack. It's a very nice line, but it seems that statsmodels is not included. I have no choice but to build the environment by myself, but if I have a hard time, I will consider the direction of doing my best with FaaS.

Cloud Docker is virtually the same as ECS if AWS is used as the back, so it is excluded once. However, multi-platform support is interesting like Terraform. It is a service that I would like to touch once on another occasion.

GAE can use a user-defined Dockerfile-based runtime if it is a beta release Flexible Environment. There seems to be a possibility.

Implementation

Adopt AWS Lambda

Please refer to the Company Blog for materials. The story about implementation is written from around slide p.30.

In the first place, as a matter of fact (due to the subject matter to be handled), there was only a place where data could be handled by a company-supplied AWS account ... The verification itself can be done using sample data, but considering the application to actual data, it is literacy to have the data in the account of the personal contract.

Impressions after mounting

The implementation is too rough, so the configuration and functions need to be refactored more and more. I want to separate the pre-processing and post-processing from the Function that drives the prediction logic and make it a string-like configuration. Since the data source is on Salesforce, I would like to work on the cooperation there.

This and that about Lambda

It's a pity that the only way to add a package is to "push into the zip archive yourself". While saying that it is serverless, it's kind of overwhelming to have such a hard time around the construction. .. ..

Well, nowadays, deployment tools targeting FaaS are coming out, and I think there are many areas that can be covered by the power of the tools. It's a hassle, but the current perception is that it's a serious concern.

Regarding the deployment tool, Lamvery seems to be useful at the moment. It seems that it has a function to collect and bundle not only Python packages but also shared libraries, and the build work seems to be progressing. It is also convenient that the management unit is the Function unit. I didn't use any deployment tools in particular, so I will use Lamvery for future development.

Others that were not considered (or rather Azure)

I chose AWS Lambda because of my company background, but to be honest, I prefer to use Azure in the current situation.

The Azure Machine Learning _ "Execute Python Script" _ execution environment covered all the packages I needed. Azure Machine Learning also considers integration with data stores and applications. If you want to play with the logic (of the analysis part) and also want to link with external services, I think Azure Machine Learning is the best choice.

As a reminder, you need to make sure that the required packages are included in the Execute Python Script execution environment (Anaconda). It seems that it can be managed even if it is not included, but in order to clear it, it is necessary to take the trouble close to Lambda above, so the advantage of being able to use it easily is lost. (Official Reference)

Summary

I analyzed the "monthly sales information" uploaded to S3 with the AWS Lambda execution platform and tried to predict the latest sales.

In order to run the prediction logic, it was necessary to use packages that are not included in the Lambda function execution environment, but by bundling them in the deployment package + adding .so loading process to the startup process at runtime, this Clearing the problem.

Since the implementation is still rough, there are many points for improvement. As of January 14, 2017, the things left unfinished and future prospects are as follows.

--Labor saving for build and deploy work --Use of deployment tools. Consider using Lamvery

version control --Same as above --Too many processes in one Function --For pre-processing and post-processing, separate Functions and connect them in a string. --Flow definition using Step Functions --I also want to automate data source import (from Salesforce) --Use of cooperation tools such as SkyOnDemand or write Lambda Function for data acquisition
Visualization --Basically, the policy is not to scratch --Leave it to BI tools or cover with visualization function on Salesforce side --Pursuit of prediction accuracy --Since the number of samples of the data source itself is small, there is a limit to improving accuracy with a simple ARMA model alone. ――I don't have a good idea at the moment, so I will consider it in the future. --Application to other fields --Under consideration for application to server monitoring

I want to do machine learning even without a server --Time Series Edition -