[About this sample](#About this sample)
[Step 1. Environment construction](# step-1-Environment construction)
[Step 2. Create a predictive model using Python](# step-2-Create a predictive model using Python)
[Step 3. Deploying and using the predictive model on SQL Server](# step-3-Deploying and using the predictive model on sql-server)
This sample runs machine learning on SQL Server 2017. The scenario is a forecast of future rentals in the ski rental business. The past rental history dataset shown below is used.
Year | Month | Day | WeekDay | Holiday | Snow | RentalCount |
---|---|---|---|---|---|---|
Year | Month | Day | 曜Day | 祝Dayフラグ | Snowfall flag | Number of rentals |
** Software requirements: **
** Sample code: **
rental_prediction.py A Python script that generates a prediction model and uses it to predict the number of rentals.
rental_prediction.sql Deploy the processing of rent_prediction.py within SQL Server (create stored procedures and tables for training, save models, create stored procedures for prediction).
setup.sql Restore the backup file (replace the file path with the downloaded path).
** Sample data: **
** Source: **
This article is based on Build a predictive model using Python and SQL Server ML Services.
Build the environment to execute this sample.
I tried using SQL Server 2017 In-Database Python for SQL Server 2017 Install Database Engine Services and Machine Learning Services (In-Database).
Sample DB [Backup File (TutorialDB.bak)](https://github.com/gho9o9/sql-server-samples/raw/master/samples/features/machine-learning-services/python/getting-started/rental -Download (prediction / TutorialDB.bak) and restore it by running Setup.sql. Setup.sql assumes that you have downloaded the backup file to C: \ Program Files \ Microsoft SQL Server \ MSSQL13.MSSQLSERVER \ MSSQL \ Backup. Change the path as appropriate according to your environment.
Setup.sql
USE master;
GO
RESTORE DATABASE TutorialDB
FROM DISK = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\TutorialDB.bak'
WITH
MOVE 'TutorialDB' TO 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TutorialDB.mdf'
,MOVE 'TutorialDB_log' TO 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TutorialDB.ldf';
GO
Check the data created by the restore.
T-SQL
USE TutorialDB;
SELECT * FROM [dbo].[rental_data];
To run Python (and R) in SQL Server 2017, you need to change the setting of external scripts enabled in sp_configure. Also, the external scripts enabled parameter requires a restart of SQL Server 2017 for the setting changes to take effect.
T-SQL
EXEC sp_configure 'external scripts enabled', 1;
cmd
net stop "SQL Server Launchpad (MSSQLSERVER)"
net stop "SQL Server (MSSQLSERVER)"
net start "SQL Server (MSSQLSERVER)"
net start "SQL Server Launchpad (MSSQLSERVER)"
(*) Change the instance name according to the environment. Also, if there is a service that depends on the SQL Server service, such as the SQL Server Agent service, restart it explicitly.
First, create a prediction model by Python in the Python IDE.
Import the required libraries.
rental_prediction.py
import os
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from revoscalepy import RxSqlServerData
from revoscalepy import rx_import
Library | Use |
---|---|
scikit-learn | Used for machine learning |
RevoScalePy | UsedtoaccessSQLServer(Alsoavailableformachinelearning) |
Connect to SQL Server & get data and load it into panas dataframe.
rental_prediction.py
sql_server = os.getenv('PYTEST_SQL_SERVER', '.')
conn_str = 'Driver=SQL Server;Server=' + sql_server + ';Database=TutorialDB;Trusted_Connection=True;'
column_info = {
"Year" : { "type" : "integer" },
"Month" : { "type" : "integer" },
"Day" : { "type" : "integer" },
"RentalCount" : { "type" : "integer" },
"WeekDay" : {
"type" : "factor",
"levels" : ["1", "2", "3", "4", "5", "6", "7"]
},
"Holiday" : {
"type" : "factor",
"levels" : ["1", "0"]
},
"Snow" : {
"type" : "factor",
"levels" : ["1", "0"]
}
}
data_source = RxSqlServerData(sql_query="SELECT RentalCount, Year, Month, Day, WeekDay, Snow, Holiday FROM dbo.rental_data",
connection_string=conn_str, column_info=column_info)
# import data source and convert to pandas dataframe
df = pd.DataFrame(rx_import(data_source))
print("Data frame:", df)
# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Filter the columns to remove ones we don't want.
columns = [c for c in columns if c not in ["Year"]]
(*) Change the connection destination server (variable: sql_server) according to the environment.
Results
Rows Processed: 453
Data frame: RentalCount Year Month Day WeekDay Snow Holiday
0 445 2014 1 20 2 0 1
1 40 2014 2 13 5 0 0
2 456 2013 3 10 1 0 0
...
450 29 2015 3 24 3 1 0
451 50 2014 3 26 4 1 0
452 377 2015 12 6 1 1 0
[453 rows x 7 columns]
This sample uses a linear regression algorithm for model training. This model training is finding the function (model) that best describes the correlation of variables in the dataset.
rental_prediction.py
# Store the variable we'll be predicting on.
target = "RentalCount"
# Generate the training set. Set random_state to be able to replicate results.
train = df.sample(frac=0.8, random_state=1)
# Select anything not in the training set and put it in the testing set.
test = df.loc[~df.index.isin(train.index)]
# Print the shapes of both sets.
print("Training set shape:", train.shape)
print("Testing set shape:", test.shape)
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(train[columns], train[target])
Results
Training set shape: (362, 7)
Testing set shape: (91, 7)
Predict with the model created earlier using the test data.
rental_prediction.py
# Generate our predictions for the test set.
lin_predictions = lin_model.predict(test[columns])
print("Predictions:", end="")
print(['{:.15f}'.format(n) for n in lin_predictions])
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, test[target])
print("Computed error:", lin_mse)
Results
Predictions:['40.000000000000007', '38.000000000000007', '240.000000000000000', '39.000000000000000', '514.000000000000000', '48.000000000000007', '297.000000000000000', '24.999999999999993',
...
'432.000000000000000', '24.999999999999993', '39.000000000000007', '28.000000000000004', '325.000000000000000', '46.000000000000014', '36.000000000000014', '50.000000000000007', '63.000000000000007']
Computed error: 6.85182043392e-29
SQL Server Machine Learning Services allows you to train and test predictive models in the context of SQL Server. Create a T-SQL program that contains embedded Python scripts that the SQL Server database engine handles. These Python codes run in SQL Server, which simplifies interaction with the data stored in the database.
Define a table to store the model and forecast results.
rental_prediction.sql
-- 3-1.table definition
--Setup model table
DROP TABLE IF EXISTS rental_py_models;
GO
CREATE TABLE rental_py_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
model VARBINARY(MAX) NOT NULL
);
GO
--Create a table to store the predictions in
DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
GO
CREATE TABLE [dbo].[py_rental_predictions](
[RentalCount_Predicted] [int] NULL,
[RentalCount_Actual] [int] NULL,
[Month] [int] NULL,
[Day] [int] NULL,
[WeekDay] [int] NULL,
[Snow] [int] NULL,
[Holiday] [int] NULL,
[Year] [int] NULL
) ON [PRIMARY]
GO
The model creation code created in Python is diverted and implemented in SQL Server. This stored procedure uses the data in SQL Server to create a linear regression model.
rental_prediction.sql
-- 3-2.Modeling stored procedure definition
-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
DROP PROCEDURE IF EXISTS generate_rental_py_model;
go
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
df = rental_train_data
# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Store the variable well be predicting on.
target = "RentalCount"
from sklearn.linear_model import LinearRegression
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(df[columns], df[target])
import pickle
#Before saving the model to the DB table, we need to convert it to a binary object
trained_model = pickle.dumps(lin_model)
'
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;
GO
Model Creation Registers the model created by executing the stored procedure in SQL Server as VARBINARY data.
rental_prediction.sql
-- 3-3.Model creation execution
TRUNCATE TABLE rental_py_models;
DECLARE @model VARBINARY(MAX);
EXEC generate_rental_py_model @model OUTPUT;
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
SELECT * FROM rental_py_models;
Results
model_name model
------------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
linear_model 0x800363736B6C6561726E2E6C696E6561725F6D6F64656C2E626173650A4C696E65617252656772657373696F6E0A7100298171017D71022858060000006E5F6A6F627371034B01580500000072616E6B5F71044B0758100000005F736B6C6561726E5F76657273696F6E71055806000000302E31382E31710658090000006E
(One line processed)
Predict the test data registered in SQL Server with the model registered in SQL Server.
rental_prediction.sql
-- 3-4.Predictive stored procedure definition
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
GO
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
AS
BEGIN
DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
EXEC sp_execute_external_script
@language = N'Python'
, @script = N'
import pickle
rental_model = pickle.loads(py_model)
df = rental_score_data
# Get all the columns from the dataframe.
columns = df.columns.tolist()
# Store the variable well be predicting on.
target = "RentalCount"
# Generate our predictions for the test set.
lin_predictions = rental_model.predict(df[columns])
# Import the scikit-learn function to compute error.
from sklearn.metrics import mean_squared_error
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, df[target])
import pandas as pd
predictions_df = pd.DataFrame(lin_predictions)
OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
'
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
END;
GO
Execute a predictive stored procedure to make a prediction and register the result in SQL Server.
rental_prediction.sql
--3-5.Predictive execution
TRUNCATE TABLE py_rental_predictions;
--Insert the results of the predictions for test set into a table
INSERT INTO py_rental_predictions
EXEC py_predict_rentalcount 'linear_model';
-- Select contents of the table
SELECT * FROM py_rental_predictions;
Results
RentalCount_Predicted RentalCount_Actual Month Day WeekDay Snow Holiday Year
--------------------- ------------------ ----------- ----------- ----------- ----------- ----------- -----------
41 42 2 11 4 0 0 2015
360 360 3 29 1 0 0 2015
19 20 4 22 4 0 0 2015
...
25 26 3 18 4 0 0 2015
28 29 3 24 3 1 0 2015
377 377 12 6 1 1 0 2015
(151 lines processed)
Visualize the result registered in SQL Server with Power BI etc.
Recommended Posts