The purpose of this tutorial is to provide SQL programmers with a hands-on experience of building machine learning solutions in SQL Server. In this tutorial, you will learn how to incorporate Python into your application by adding Python code to your stored procedure.
[!NOTE] Click here for the R version of a similar tutorial (http://qiita.com/qio9o9/items/4f020bb93dc07567e556). The R version works on both SQL Server 2017 and SQL Server 2016.
The life cycle of machine learning development generally consists of data acquisition and cleansing, data exploration and feature engineering, model training and tuning, and finally model deployment to production. For actual coding, debugging, and testing, it is best to use an integrated development environment for Python (Python Tools for Visual Studio, PyCharm, Spyder, etc.) such as:
After creating and testing the solution in the Python IDE, deploy the Python code to SQL Server as a Transact-SQL stored procedure. This tutorial will provide you with all the Python code you need.
Download the sample dataset and all script files to your local computer.
-Step 2: Import data into SQL Server using PowerShell
Run a PowerShell script that creates a database and table on the specified instance and loads the sample data into the table.
-Step 3: Data Search and Visualization
Run Python from a Transact-SQL stored procedure to perform basic data exploration and visualization.
-Step 4: Feature extraction of data using T-SQL
Data feature extraction is performed using a user-defined function.
-Step 5: Training and saving the model using T-SQL
Build and save a machine learning model with stored procedural Python code.
After saving the model in the database, use Transact-SQL to call the model for prediction.
[!NOTE] If there is a problem with the code embedded in the stored procedure, the information returned by the stored procedure is usually not enough to understand the cause of the error, so testing your Python code is an integrated development environment for Python (IDE). ) Is recommended.
This tutorial uses the well-known NYC Taxi dataset. Data will be sampled and used to make this tutorial quick and easy. Based on columns in this dataset, such as time, distance, and pick-up location, we will create a binary classification model that predicts whether a tip will be available for a particular ride.
Before you start the tutorial, you need to complete the following preparations: ::
--Install Database Engine Services and Machine Learning Services (In-Database) for SQL Server 2017.
--To run Python (and R) in SQL Server 2017, you need to change the setting of external scripts enabled in sp_configure. Also, the external scripts enabled parameter requires a restart of SQL Server 2017 for the setting changes to take effect.
―― 1. Enabling the external script execution function
```SQL:T-SQL
EXEC sp_configure 'external scripts enabled', 1;
```
―― 2. Restart SQL Server 2017
```cmd:cmd
net stop "SQL Server Launchpad (MSSQLSERVER)"
net stop "SQL Server (MSSQLSERVER)"
net start "SQL Server (MSSQLSERVER)"
net start "SQL Server Launchpad (MSSQLSERVER)"
```
Change the instance name passed to the net command according to your environment. Also, if there is a service that depends on the SQL Server service, such as the SQL Server Agent service, restart it explicitly.
--The SQL Server login used in this tutorial requires permissions to create databases and other objects, update data, browse data, and execute stored procedures.
In-Database Python Analytics for SQL Developers
Machine Learning Services with Python
Recommended Posts