I used the learning model generated by machine learning as an API server, sent data from the browser via JSON communication, and returned the predicted value. This machine learning API server is implemented by three main programs. First, perform machine learning with XGBoost to generate and save the learning model. Next, implement the learning model API server in Flask. Finally, write the form tag in the HTML file so that the data obtained from the form tag can be JSON communicated with Ajax of javascript. With these three programs, you can create something that sends data from the browser to the API server and returns the predicted value.
Libraries such as Anaconda, XGBoost, joblib, Flask, and flask-cors are installed.
This API communication by machine learning can be implemented by following the following process.
--Making a learning model with machine learning --Create an API server with flask --API communication from the browser
Here, we will use XGBoost to generate a training model. The training data uses Kaggle's Titanic dataset.
Before doing machine learning with XGBoost, we do some pre-processing. Kaggle's Titanic dataset is divided into train and test data, so they are concatenated with concat to perform preprocessing together. Pre-processing includes processing missing values, replacing categorical data with numbers, and deleting unnecessary features.
Load the library required for preprocessing. It also loads the dataset as a pandas dataframe.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
train_df = pd.read_csv('titanic/train.csv')
test_df = pd.read_csv('titanic/test.csv')
train_df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
test_df.head()
PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 330911 | 7.8292 | NaN | Q |
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1 | 0 | 363272 | 7.0000 | NaN | S |
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 | 240276 | 9.6875 | NaN | Q |
3 | 895 | 3 | Wirz, Mr. Albert | male | 27.0 | 0 | 0 | 315154 | 8.6625 | NaN | S |
4 | 896 | 3 | Hirvonen, Mrs. Alexander (Helga E Lindqvist) | female | 22.0 | 1 | 1 | 3101298 | 12.2875 | NaN | S |
Since I want to perform preprocessing collectively, train data and test data are linked with concat.
all_df = pd.concat((train_df.loc[:, 'Pclass' : 'Embarked'], test_df.loc[:, 'Pclass' : 'Embarked']))
all_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 417
Data columns (total 10 columns):
Pclass 1309 non-null int64
Name 1309 non-null object
Sex 1309 non-null object
Age 1046 non-null float64
SibSp 1309 non-null int64
Parch 1309 non-null int64
Ticket 1309 non-null object
Fare 1308 non-null float64
Cabin 295 non-null object
Embarked 1307 non-null object
dtypes: float64(2), int64(3), object(5)
memory usage: 112.5+ KB
The Age, Fare, and Embarked values are missing, so I'm filling them with the mean and mode.
all_df['Age'] = all_df['Age'].fillna(all_df['Age'].mean())
all_df['Fare'] = all_df['Fare'].fillna(all_df['Fare'].mean())
all_df['Embarked'] = all_df['Embarked'].fillna(all_df['Embarked'].mode()[0])
all_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 417
Data columns (total 10 columns):
Pclass 1309 non-null int64
Name 1309 non-null object
Sex 1309 non-null object
Age 1309 non-null float64
SibSp 1309 non-null int64
Parch 1309 non-null int64
Ticket 1309 non-null object
Fare 1309 non-null float64
Cabin 295 non-null object
Embarked 1309 non-null object
dtypes: float64(2), int64(3), object(5)
memory usage: 112.5+ KB
all_df.head()
Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
Sex and Embarked are categorical data, so LabelEncoder replaces them with numbers.
cat_features = ['Sex', 'Embarked']
for col in cat_features:
lbl = LabelEncoder()
all_df[col] = lbl.fit_transform(list(all_df[col].values))
all_df.head()
Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | Braund, Mr. Owen Harris | 1 | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | 2 |
1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | 0 | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | 0 |
2 | 3 | Heikkinen, Miss. Laina | 0 | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | 2 |
3 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | 0 | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | 2 |
4 | 3 | Allen, Mr. William Henry | 1 | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | 2 |
Name and Ticket are categorical and unique values, so delete them. Also, Cabin has many missing values, so delete it.
all_df = all_df.drop(columns = ['Name', 'Ticket', 'Cabin'])
all_df.head()
Pclass | Sex | Age | SibSp | Parch | Fare | Embarked | |
---|---|---|---|---|---|---|---|
0 | 3 | 1 | 22.0 | 1 | 0 | 7.2500 | 2 |
1 | 1 | 0 | 38.0 | 1 | 0 | 71.2833 | 0 |
2 | 3 | 0 | 26.0 | 0 | 0 | 7.9250 | 2 |
3 | 1 | 0 | 35.0 | 1 | 0 | 53.1000 | 2 |
4 | 3 | 1 | 35.0 | 0 | 0 | 8.0500 | 2 |
Since train and test were connected, train and test are separated so that they become training data. You can separate train and test by using the shape value of train_df.
train = all_df[:train_df.shape[0]]
test = all_df[train_df.shape[0]:]
train.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 891 entries, 0 to 890
Data columns (total 7 columns):
Pclass 891 non-null int64
Sex 891 non-null int64
Age 891 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Fare 891 non-null float64
Embarked 891 non-null int64
dtypes: float64(2), int64(5)
memory usage: 55.7 KB
Now that the pre-processing is complete, we will continue machine learning with XGBoost. This time, rather than improving the accuracy of the learning model, the purpose is to create an API server using the learning model, so parameters etc. are learned with almost default values.
y = train_df['Survived']
X_train, X_test, y_train, y_test = train_test_split(train, y, random_state = 0)
import xgboost as xgb
params = {
"objective": "binary:logistic",
"eval_metric": "auc",
"eta": 0.1,
"max_depth": 6,
"subsample": 1,
"colsample_bytree": 1,
"silent": 1
}
dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)
model = xgb.train(params = params,
dtrain = dtrain,
num_boost_round = 100,
early_stopping_rounds = 10,
evals = [(dtest, 'test')])
[0] test-auc:0.886905
Will train until test-auc hasn't improved in 10 rounds.
[1] test-auc:0.89624
[2] test-auc:0.893243
[3] test-auc:0.889603
[4] test-auc:0.892857
[5] test-auc:0.886005
[6] test-auc:0.890673
[7] test-auc:0.894741
[8] test-auc:0.889603
[9] test-auc:0.888832
[10] test-auc:0.889431
[11] test-auc:0.89153
Stopping. Best iteration:
[1] test-auc:0.89624
There are several ways to save a machine-learned training model, but here we use joblib to save it as a pkl file. Since the saved pkl file is saved in the specified location, copy it to the folder of the API server after this and use it.
from sklearn.externals import joblib
joblib.dump(model, 'titanic_model.pkl')
['titanic_model.pkl']
Here, we are trying to use the learning model generated by machine learning as an API server. Flask, a Python microservices framework, is used for API server development. The flow of development is to build a virtual environment with conda, test a simple API server, and put the learning model created with XGBoost on it.
The virtual environment uses Anaconda's conda. Create a folder for application development (titanic_api in this case) in the terminal and move it to that folder. Then conda create creates the virtual environment and conda activate activates the virtual environment.
mkdir titanic_api
cd titanic_api
conda create -n titanictenv
conda activate titanictenv
To develop an API server in Flask, let's first create and test a simple API server. Create the following folders and files in the folder you created earlier. If you can write the following code in each file, start the API server, and communicate from curl, the simple API server test is successful.
Create folders and files so that they have the following hierarchy. If you want to create an empty file, it is convenient to use the touch command.
titanic_api
├── api
│ ├── __init__.py
│ └── views
│ └── user.py
├── titanic_app.py
└── titanic_model.pkl
Write the code in the file you just created as follows. There are three files needed to test a simple API server: api / views / user.py, api / __ init__.py, and titanic_app.py. It is convenient to use vim when writing in the terminal, and Atom when writing in the GUI.
api/views/user.py
from flask import Blueprint, request, make_response, jsonify
#Routing settings
user_router = Blueprint('user_router', __name__)
#Specify path and HTTP method
@user_router.route('/users', methods=['GET'])
def get_user_list():
return make_response(jsonify({
'users': [
{
'id': 1,
'name': 'John'
}
]
}))
api/__init__.py
from flask import Flask, make_response, jsonify
from .views.user import user_router
def create_app():
app = Flask(__name__)
app.register_blueprint(user_router, url_prefix='/api')
return app
app = create_app()
titanic_app.py
from api import app
if __name__ == '__main__':
app.run()
After writing the code above, start the server with python titanic_app.py. If it starts successfully, open another terminal and test the communication with the curl command like the following. If the communication is successful, it will return the following data.
curl http://127.0.0.1:5000/api/users
{
"users": [
{
"id": 1,
"name": "John"
}
]
}
Rewrite titanic_app.py, which is the startup file of the simple API server, as follows. At this time, the learning model must be saved directly under titanic_api.
titanic_app.py
import json
from flask import Flask
from flask import request
from flask import abort
import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb
model = joblib.load("titanic_model.pkl")
app = Flask(__name__)
# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
@app.route('/titanic', methods=['POST'])
def titanic():
if not request.json:
abort(400)
payload = request.json['data']
values = [float(i) for i in payload.split(',')]
data1 = pd.DataFrame([values], columns=headers, dtype=float)
predict = model.predict(xgb.DMatrix(data1))
return json.dumps(str(predict[0]))
if __name__ == "__main__":
app.run(debug=True, port=5000)
After rewriting the code, start the API server again with python titanic_app.py. After the API server starts, the communication test is done with the curl command as shown below. If the value after the decimal point 1 is returned for the sent JSON data, it is successful. Now you have a learning model generated by machine learning as an API server.
curl http://localhost:5000/titanic -s -X POST -H "Content-Type: application/json" -d '{"data": "3, 1, 22.0, 1, 0, 7.2500, 2"}'
Finally, in addition to communication using the curl command from the terminal, we will create something that returns the predicted value when you enter the value from the browser. What we do here is to enable Ajax communication with the API server created earlier, and to enable communication from the browser to the API server with an HTML file.
Here, in order to be able to communicate from HTML with javascript Ajax, it is necessary to add the API server startup file written in Flask as follows. I'll add a library called flask_cors and related code. flask_cors must be pre-installed.
titanic_app.py
import json
from flask import Flask
from flask import request
from flask import abort
from flask_cors import CORS #to add
import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb
model = joblib.load("titanic_model.pkl")
app = Flask(__name__)
#add to
@app.after_request
def after_request(response):
response.headers.add('Access-Control-Allow-Origin', '*')
response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS')
return response
#↑ Add up to here
# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
@app.route('/titanic', methods=['POST'])
def titanic():
if not request.json:
abort(400)
payload = request.json['data']
values = [float(i) for i in payload.split(',')]
data1 = pd.DataFrame([values], columns=headers, dtype=float)
predict = model.predict(xgb.DMatrix(data1))
return json.dumps(str(predict[0]))
if __name__ == "__main__":
app.run(debug=True, port=5000)
In the HTML file, the interface part is the input tag inside the
tag, etc., and creates a data input form. The input data is received by javascript, formatted, converted to JSON format, and POSTed by Ajax communication. When the communication is successful, the predicted value from the API server is received and displayed in the area of the textarea tag.index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Send JSON data by POST from HTML file</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<script type="text/javascript">
$(function(){
$("#response").html("Response Values");
$("#button").click( function(){
var url = $("#url_post").val();
var feature1 =
$("#value1").val() + "," +
$("#value2").val() + "," +
$("#value3").val() + "," +
$("#value4").val() + "," +
$("#value5").val() + "," +
$("#value6").val() + "," +
$("#value7").val();
var JSONdata = {
data: feature1
};
alert(JSON.stringify(JSONdata));
$.ajax({
type: 'POST',
url: url,
data: JSON.stringify(JSONdata),
contentType: 'application/JSON',
dataType: 'JSON',
scriptCharset: 'utf-8',
success : function(data) {
// Success
alert("success");
alert(JSON.stringify(JSONdata));
$("#response").html(JSON.stringify(data));
},
error : function(data) {
// Error
alert("error");
alert(JSON.stringify(JSONdata));
$("#response").html(JSON.stringify(data));
}
});
})
})
</script>
</head>
<body>
<h1>Send JSON data by POST from HTML file</h1>
<p>URL: <input type="text" id="url_post" name="url" size="100" value="http://localhost:5000/titanic"></p>
<p>Pclass: <input type="number" id="value1" size="30" value=3></p>
<p>Sex: <input type="number" id="value2" size="30" value=1></p>
<p>Age: <input type="number" id="value3" size="30" value="22.0"></p>
<p>SibSp: <input type="number" id="value4" size="30" value="1"></p>
<p>Parch: <input type="number" id="value5" size="30" value="0"></p>
<p>Fare: <input type="number" id="value6" size="30" value="7.2500"></p>
<p>Embarked: <input type="number" id="value7" size="30" value="2"></p>
<p><button id="button" type="button">submit</button></p>
<textarea id="response" cols=120 rows=10 disabled></textarea>
</body>
</html>
reference Build a virtual environment with conda: https://code-graffiti.com/how-to-build-a-virtual-environment-with-conda/ Develop API with flask: https://swallow-incubate.com/archives/blog/20190819 Make xgboost model an API server: https://towardsdatascience.com/publishing-machine-learning-api-with-python-flask-98be46fb2440 Allow json to be POSTed to API server via Ajax communication: https://www.hands-lab.com/tech/entry/3716.html Send JSON data by POST from HTML file: https://qiita.com/kidatti/items/21cc5c5154dbbb1aa27f
Recommended Posts