In-Database Python Analysis for SQL Developers
Step 3: Data Search and Visualization
In this step, you will run the downloaded script RunSQL_SQL_Walkthrough.ps1
to create the database objects needed for the tutorial and import the sample data.
Execute the PowerShell script RunSQL_SQL_Walkthrough.ps1
in the downloaded files to prepare the tutorial environment. This script performs the following actions:
--Install the SQL Native Client and SQL command line utilities if they are not already installed. These are required for bulk loading data using bcp.
--Create a database and tables on your SQL Server instance and bulk load the data into them.
--Create more functions and stored procedures.
Open a PowerShell command prompt as an administrator and run the following command:
.\RunSQL_SQL_Walkthrough.ps1
You will be prompted to enter the following information:
--The name or address of the server where Machine Learning Services (Python) is installed.
--The name of the database to create
--The target SQL Server user name and password. This user must have permission to create databases, tables, stored procedures, functions, and load data into tables. If you omit the user name and password, you will be logged in as the current Windows user.
--The path of the sample data file nyctaxi1pct.csv
in the downloaded files. For example, C: \ tempPythonSQL \ nyctaxi1pct.csv
.
![sqldev-python-ps-1-gho9o9.png](https://qiita-image-store.s3.amazonaws.com/0/195839/db34c97d-167e-b897-1bdc-299d6feeed71.png)
![sqldev-python-ps-2-gho9o9.png](https://qiita-image-store.s3.amazonaws.com/0/195839/92d4987c-d3f6-ea5f-db2d-6e5c63b64346.png)
All T-SQL scripts have been modified to replace the database and user names specified in the steps above with placeholders.
Make sure that the stored procedures and functions created by the T-SQL script are created in the database.
T-SQL script file | Stored procedure / function |
---|---|
create-db-tb-upload-data.sql | Create a database and four tables. table nyctaxi_sample :The main NYC Taxi dataset is created. The data loaded is a 1% sample of the NYC Taxi dataset. The definition of the clustered column store index improves storage efficiency and query performance.table nyc_taxi_models :A trained advanced analytical model is registered.table nyctaxi_sample_training :The dataset used to train the model is created.table nyctaxi_sample_testing :The dataset used to test the model is registered. |
fnCalculateDistance.sql | Scalar value function that calculates the direct distance between the boarding position and the disembarking positionfnCalculateDistance Create a. |
fnEngineerFeatures.sql | A table-valued function that returns a feature-value set for model trainingfnEngineerFeatures Create a. |
TrainingTestingSplit.sql | nyctaxi_Data in the sample table, nyctaxi_sample_training and nyctaxi_sample_Procedure to divide into two of testingTrainingTestingSplit Create a. |
PredictTipSciKitPy.sql | For prediction using the model, scikit-Procedure to call the trained model created by learnPredictTipSciKitPy Create a. The procedure accepts a query as an input parameter and returns a column of numbers containing the score for the input row. |
PredictTipRxPy.sql | A procedure that calls a trained model created with RevoScalePy for prediction using the modelPredictTipRxPy Create a. The procedure accepts a query as an input parameter and returns a column of numbers containing the score for the input row. |
PredictTipSingleModeSciKitPy.sql | For prediction using the model, scikit-Procedure to call the trained model created by learnPredictTipSingleModeSciKitPy Create a. This stored procedure takes new observations as input, accepts individual feature values as inline parameters, and returns predictions for the new observations. |
PredictTipSingleModeRxPy.sql | For prediction using the model, scikit-Procedure to call the trained model created by learnPredictTipSingleModeRxPy Create a. This stored procedure takes new observations as input, accepts individual feature values as inline parameters, and returns predictions for the new observations. |
SerializePlots.sql | Procedure for data searchSerializePlots Create a. This stored procedure uses Python to create graphics and serialize graph objects. |
TrainTipPredictionModelSciKitPy.sql | scikit-Procedure for training logistic regression model by learnTrainTipPredictionModelSciKitPy Create a. This model is trained using 60% of randomly selected data to predict the tipped value (whether or not to tip). The output of the stored procedure is a trained model, a tablenyc_taxi_models Will be registered in. |
TrainTipPredictionModelRxPy.sql | Procedure for training logistic regression model with RevoScalePyTrainTipPredictionModelRxPy Create a. This model is trained using 60% of randomly selected data to predict the tipped value (whether or not to tip). The output of the stored procedure is a trained model, a tablenyc_taxi_models Will be registered in. |
[!NOTE]
The T-SQL script does not recreate the database object, so if it already exists, the data will be duplicated. Therefore, delete the existing object before executing the script again.
After performing the above steps, execute the following SQL to redefine some objects. ** The specifications of RevoScalePy changed when upgrading from SQL Server 2017 CTP to SQL Server 2017 RC, and the main reason for this redefinition is to accommodate the changes. ** **
Step 3: Data exploration and visualization
In-Database Python Analysis for SQL Developers
Step 2: Import Data to SQL Server using PowerShell
Machine Learning Services with Python
Recommended Posts