From the beginning of the tutorial

In-Database Python Analysis for SQL Developers

Previous step

Step 5: Training and saving a model using T-SQL

Step 6: Use the model

In this step you will learn how to use the model trained in the previous step. "Use" here means "deploying the model to production for scoring." This is easy to expand because the Python code is contained in the stored procedure. To make new observation predictions from your application, just call the stored procedure.

There are two ways to call a Python model from a stored procedure:

-** Batch scoring mode : Use a SELECT query to serve multiple rows of data. The stored procedure returns a table of observations corresponding to the input - Individual scoring mode **: Pass a set of individual parameter values as input. The stored procedure returns a single record or value.

Scoring using the scikit-learn model

The stored procedure PredictTipSciKitPy uses the scikit-learn model.

The stored procedure PredictTipSciKitPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

In Management Studio's Object Explorer, expand Programming> Stored Procedures.

Right-click on PredictTipSciKitPy and select Modify to open the Transact-SQL script in a new query window.

DROP PROCEDURE IF EXISTS PredictTipSciKitPy;
GO

CREATE PROCEDURE [dbo].[PredictTipSciKitPy](@model varchar(50), @inquery nvarchar(max))
AS
BEGIN
  DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);

  EXEC sp_execute_external_script 
	@language = N'Python',
    @script = N'
import pickle
import numpy
# import pandas
from sklearn import metrics

mod = pickle.loads(lmodel2)

X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
y = numpy.ravel(InputDataSet[["tipped"]])

prob_array = mod.predict_proba(X)
prob_list = [item[1] for item in prob_array]

prob_array = numpy.asarray(prob_list)
fpr, tpr, thresholds = metrics.roc_curve(y, prob_array)
auc_result = metrics.auc(fpr, tpr)
print("AUC on testing data is:", auc_result)

OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
',	
	@input_data_1 = @inquery,
	@input_data_1_name = N'InputDataSet',
	@params = N'@lmodel2 varbinary(max)',
	@lmodel2 = @lmodel2
  WITH RESULT SETS ((Score float));

END
GO

--Specify the model name to use in the input parameters of the stored procedure. --Fetch the serialized model from the nyc_taxi_models table based on the specified model name. --The serialized model is stored in the Python variable mod. --The new scored case is taken from the Transact-SQL query specified by @ input_data_1. The result of this query is saved in the default data frame ʻInputDat a Set. --This data frame is passed to the function predict_proba of the logistic regression model modcreated using the scikit-learn model. --The functionpredict_proba returns a float value that indicates the probability of receiving an arbitrary amount of chips. --In addition, calculate the precision metric ʻAUC (area under curve). Precision metrics such as AUC are only generated if you specify the objective variable (ie the tapped column) in addition to the explanatory variables. The prediction does not require the objective variable (variable Y), but it is required to calculate the accuracy metric. Therefore, if the data to be scored does not have an objective variable, remove the AUC calculation block from the source code and modify the stored procedure to simply return the probability of receiving a chip by the explanatory variable (variable X).

Scoring using the revoscalepy model

The stored procedure PredictTipRxPy uses a model created using the revoscalepy library. This works much like the procedure PredictTipSciKitPy, but with some changes to the revoscalepy function.

The stored procedure PredictTipRxPy is defined in SQL Server through Step 2: Import Data into SQL Server Using PowerShell (http://qiita.com/qio9o9/items/98df36982f1fbecdf5e7).

In Management Studio's Object Explorer, expand Programming> Stored Procedures.

Right-click on PredictTipRxPy and select Modify to open the Transact-SQL script in a new query window.

DROP PROCEDURE IF EXISTS PredictTipRxPy;
GO

CREATE PROCEDURE [dbo].[PredictTipRxPy](@model varchar(50), @inquery nvarchar(max))
AS
BEGIN
  DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);

  EXEC sp_execute_external_script 
	@language = N'Python',
    @script = N'
import pickle
import numpy
# import pandas
from sklearn import metrics
from revoscalepy.functions.RxPredict import rx_predict

mod = pickle.loads(lmodel2)
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
y = numpy.ravel(InputDataSet[["tipped"]])

prob_array = rx_predict(mod, X)
prob_list = prob_array["tipped_Pred"].values

prob_array = numpy.asarray(prob_list)
fpr, tpr, thresholds = metrics.roc_curve(y, prob_array)
auc_result = metrics.auc(fpr, tpr)
print("AUC on testing data is:", auc_result)
OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
',	
	@input_data_1 = @inquery,
	@input_data_1_name = N'InputDataSet',
	@params = N'@lmodel2 varbinary(max)',
	@lmodel2 = @lmodel2
  WITH RESULT SETS ((Score float));

END
GO

Perform batch scoring

The stored procedures PredictTipSciKitPy and PredictTipRxPy require two input parameters.

--Query to extract data to be scored --Trained model identifier used for scoring

In this section, you'll learn how to pass these arguments to a stored procedure to easily modify both the model and the data used for scoring.

Define the input data and call the stored procedure for scoring as follows: This example uses the stored procedure PredictTipSciKitPy for scoring and passes the model name and query string.

DECLARE @query_string nvarchar(max) -- Specify input query
  SET @query_string='
  select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance,
  dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
  from nyctaxi_sample_testing'
EXEC [dbo].[PredictTipSciKitPy] 'SciKit_model', @query_string;

The stored procedure returns an estimated probability of receiving a tip for each operation record passed as part of an input query. The predicted values are displayed in the results pane of Management Studio. The accuracy metric ʻAUC (area under the curve)` is output in the message pane.

To use the revoscalepy model for scoring, call the stored procedure PredictTipRxPy.

DECLARE @query_string nvarchar(max) -- Specify input query
  SET @query_string='
  select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance,
  dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
  from nyctaxi_sample_testing'
EXEC [dbo].[PredictTipRxPy] 'revoscalepy_model', @query_string;

Performing individual scoring

In some cases, instead of batch scoring, you want to pass a single case and get a single result based on that value. For example, configure an Excel worksheet, web application, or Reporting Services report to call a stored procedure based on user input.

In this section, you will learn how to call the stored procedures PredictTipSingleModeSciKitPy and PredictTipSingleModeRxPy to create a single prediction.

The stored procedures PredictTipSingleModeSciKitPy and PredictTipSingleModeRxPy are defined in SQL Server through Step 2: Import data into SQL Server using PowerShell.

In Management Studio's Object Explorer, expand Programming> Stored Procedures.
Right-click PredictTipSingleModeSciKitPy or PredictTipSingleModeRxPy and select Modify to open the Transact-SQL script in a new query window.

These stored procedures use the scikit-learn and revoscalepy models to perform scoring as follows: --The model name and multiple single values are provided as input. These inputs include the number of passengers, driving distance, and so on. --The table-valued function fnEngineerFeatures takes latitude and longitude as input and converts them directly to distance. --When calling a stored procedure from an external application, make sure that the input data matches the required input functionality of the Python model. This includes casting the input data to a Python data type and validating the data type and data length. --The stored procedure creates a score based on the stored Python model.

Below is a stored procedure PredictTipSingleModeSciKitPy that performs scoring using the scikit-learn model.

```SQL:PredictTipSingleModeSciKitPy
CREATE PROCEDURE [dbo].[PredictTipSingleModeSciKitPy](@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
    DECLARE @inquery nvarchar(max) = N'
    SELECT * FROM [dbo].[fnEngineerFeatures]( 
    @passenger_count,
    @trip_distance,
    @trip_time_in_secs,
    @pickup_latitude,
    @pickup_longitude,
    @dropoff_latitude,
    @dropoff_longitude)
    '
    DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    EXEC sp_execute_external_script 
        @language = N'Python',
        @script = N'
import pickle
import numpy
# import pandas

# Load model and unserialize
mod = pickle.loads(model)

# Get features for scoring from input data
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]

# Score data to get tip prediction probability as a list (of float)
prob = [mod.predict_proba(X)[0][1]]

# Create output data frame
OutputDataSet = pandas.DataFrame(data=prob, columns=["predictions"])
',
    @input_data_1 = @inquery,
    @params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
        @trip_time_in_secs int ,
        @pickup_latitude float ,
        @pickup_longitude float ,
        @dropoff_latitude float ,
        @dropoff_longitude float',
    @model = @lmodel2,
        @passenger_count =@passenger_count ,
        @trip_distance=@trip_distance,
        @trip_time_in_secs=@trip_time_in_secs,
        @pickup_latitude=@pickup_latitude,
        @pickup_longitude=@pickup_longitude,
        @dropoff_latitude=@dropoff_latitude,
        @dropoff_longitude=@dropoff_longitude
    WITH RESULT SETS ((Score float));
END
GO
```

Below is a stored procedure PredictTipSingleModeRxPy that performs scoring using the revoscalepy model.

```SQL
CREATE PROCEDURE [dbo].[PredictTipSingleModeRxPy](@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
    DECLARE @inquery nvarchar(max) = N'
    SELECT * FROM [dbo].[fnEngineerFeatures]( 
    @passenger_count,
    @trip_distance,
    @trip_time_in_secs,
    @pickup_latitude,
    @pickup_longitude,
    @dropoff_latitude,
    @dropoff_longitude)
    '
    DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
    EXEC sp_execute_external_script 
        @language = N'Python',
        @script = N'
import pickle
import numpy
# import pandas
from revoscalepy.functions.RxPredict import rx_predict

# Load model and unserialize
mod = pickle.loads(model)

# Get features for scoring from input data
x = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]

# Score data to get tip prediction probability as a list (of float)

prob_array = rx_predict(mod, x)

prob_list = prob_array["tipped_Pred"].values

# Create output data frame
OutputDataSet = pandas.DataFrame(data=prob_list, columns=["predictions"])
',
    @input_data_1 = @inquery,
    @params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
        @trip_time_in_secs int ,
        @pickup_latitude float ,
        @pickup_longitude float ,
        @dropoff_latitude float ,
        @dropoff_longitude float',
    @model = @lmodel2,
        @passenger_count =@passenger_count ,
        @trip_distance=@trip_distance,
        @trip_time_in_secs=@trip_time_in_secs,
        @pickup_latitude=@pickup_latitude,
        @pickup_longitude=@pickup_longitude,
        @dropoff_latitude=@dropoff_latitude,
        @dropoff_longitude=@dropoff_longitude
    WITH RESULT SETS ((Score float));
END
GO
```

Open a new query window in Management Studio and enter the explanatory variable column to call the stored procedure.

-- Call stored procedure PredictTipSingleModeSciKitPy to score using SciKit-Learn model
EXEC [dbo].[PredictTipSingleModeSciKitPy] 'SciKit_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303
-- Call stored procedure PredictTipSingleModeRxPy to score using revoscalepy model
EXEC [dbo].[PredictTipSingleModeRxPy] 'revoscalepy_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303

The seven explanatory variable values are in the following order: - passenger_count - trip_distance - trip_time_in_secs - pickup_latitude - pickup_longitude - dropoff_latitude - dropoff_longitude

As a result, the probability that a tip will be paid in an operation with the above parameters is returned.

![sqldev-python-step6-5-gho9o9.png](https://qiita-image-store.s3.amazonaws.com/0/195839/27bc40a6-07ae-595d-fec1-4c030e6c04ab.png)

Summary

In this tutorial, you learned how to work with Python code embedded in stored procedures. We found that integration with Transact-SQL makes it even easier to deploy Python models for forecasting and incorporate model retraining as part of an enterprise data workflow.

Link

[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model

From the beginning of the tutorial

Previous step

Step 6: Use the model

Scoring using the scikit-learn model

Scoring using the revoscalepy model

Perform batch scoring

Performing individual scoring

Summary

Link

Previous step

From the beginning of the tutorial

Source

Related item