This article is from December 17th Article of "Oracle Cloud Infrastructure Advent Calendar 2019" It is written as 2019-12-17).
--Create VCN / Compute instance / Autonomous Database (ADW) in Always Free environment of Oracle Cloud Infrastructure --Set up a Jupyter Notebook environment on your Compute instance --Machine learning using data from Autonomous Database (decision tree / random forest)
Create a virtual cloud network
Create a Compute instance
Create Autonmous Database (ADW this time)![Image01.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/5a0120bc-05c4-6f3a- e99a-3e873a044f1e.jpeg)
Set up a Jupyter Notebook environment on your Compute instance
Connect to Autonomous Database from Jupyter Notebook and perform machine learning
Create a Virtual Cloud Network (VCN) in the OCI Web Console
Virtual networking quickstart has been added since December 5, 2019
You can easily create a VCN with four input items: "VCN name", "VCN CIDR", "public subnet CIDR", and "private subnet CIDR".
(When creating a VCN using Quickstart, if you create it in Japanese display, the subnet name will be something like "Public subnet-
Create a Compute instance with the image of Oracle Linux 7 --Click "View Shape, Network and Storage Options" and in the networking configuration, 1. Select the virtual cloud network and public subnet created in --In addition, check "Assign public IP address" and create
Create a workload type from Create Autonomous Database by selecting Data Warehouse
Create a working user using SQL Developer Web etc.
CREATE USER scott IDENTIFIED BY password;
GRANT DWROLE TO scott;
GRANT UNLIMITED TABLESPACE TO scott;
GRANT
SELECT ANY TABLE,
UPDATE ANY TABLE,
DELETE ANY TABLE,
INSERT ANY TABLE
TO scott;
Connect to the Compute instance (Oracle Linux) created in 2. with ssh (opc user) and install Jupyter Notebook
4-1. Install Python 3 4-1-1. EPEL repository-Installation of Python3.6
sudo yum install -y oracle-epel-release-el7 oracle-release-el7
sudo yum install -y python36
sudo yum install -y libSM.x86_64
sudo yum install -y libXext.x86_64
sudo yum install -y libXrender.x86_64
4-1-2. mlevn (Python virtual environment setup and activation)
python3.6 -m venv mlenv
source mlenv/bin/activate
4-1-3. Install the library used for machine learning (library for executing this procedure)
pip3 install --upgrade pip
pip3 install pandas
pip3 install seaborn
pip3 install sklearn
4-1-4. Install Jupyter
python3 -m pip install jupyter
4-1-5. Oracle Instant Client Installation and Configuration
sudo yum -y install oraclelinux-developer-release-el7
sudo yum -y install python-cx_Oracle
pip3 install cx_Oracle
sudo sh -c "echo /usr/lib/oracle/18.3/client64/lib > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo ldconfig
4-1-6. Download Autonomous Database Client Credentials (Wallet) From the details screen of the Autonomous Database created in 3., click "DB Connection", and from the "Database Connection" dialog, click "Download Wallet" to download the Zip file. Transfer the downloaded Zip file to a Compute instance (such as scp) and unzip it (note the unzipped directory Example: / home / opc / wallet) Edit the sqlnet.ora file in the unzipped directory Change "~ / network / admin" specified in DIRECTORY on the "WALLET_LOCATION" line to the unzipped directory
sqlnet.ora
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/home/opc/wallet")))
SSL_SERVER_DN_MATCH=yes
4-1-7. Run the notebook Set environment variables (LD_LIBRARY_PATH is the path of the Instance Client, TNS_ADMIN is the directory where the wallet was unzipped)
export LD_LIBRARY_PATH=/usr/lib/oracle/18.3/client64/lib:$LD_LIBRARY_PATH
export TNS_ADMIN=/home/opc/wallet
source mlenv/bin/activate
jupyter notebook --ip=0.0.0.0 &
Make a note of the token key that appears when you run the jupyter notebook
To access the notebook, open this file in a browser:
file:///home/opc/.local/share/jupyter/runtime/nbserver-2210-open.html
Or copy and paste one of these URLs:
http://compute01:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc
or http://127.0.0.1:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc ```
5-1.Establish tunneling in another ssh session from client PC
$ ssh –i <private_key> opc@<public_IP_address> -L 8000:localhost:8888
5-2.Open a web browser on your local machine and access the URL below
http://localhost:8000
5-3.When prompted for a token, enter the token key that appears when you run jupyter Notebook
![image06.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/bbf7a0a4-64f7-a9a5-0a5f-1ed216b319c3.jpeg)
![image07.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b1949454-7e7c-4cd5-4d61-fc0e5c6bfb37.jpeg)
5-4.Connect to Autonomous Database from Jupyter Notebook
Autonomous Database username, password and tnsnames in the unzipped wallet.Check the connection service name from ora and specify
Example of connecting to Autonomous Database and checking DB version
import cx_Oracle
con = cx_Oracle.connect('scott/
Example of connecting to Autonomous Database and checking the number of SALES tables in sh schema
import cx_Oracle
con = cx_Oracle.connect('scott/
CUST by gender from the CUSTOMERS table_Compare VALID
import pandas.io.sql as psql import cx_Oracle import seaborn as sns %matplotlib inline
con = cx_Oracle.connect('scott/
sql = 'select CUST_GENDER,CUST_VALID from sh.customers' df = psql.read_sql(sql,con) sns.countplot('CUST_GENDER',hue='CUST_VALID',data=df)
con.close()
![image08.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b0fd0ce8-514a-d38e-c19e-ca4228e4cd5a.jpeg)
####Machine learning
scikit-Library tree and scikit in learn-Using the random forest class fier in learn ensemble
CUST in the CUSTOMERS table_Create a machine learning model that predicts VALID
-Get features and objective variables with SELECT statement and store in DataFrame
sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers' df = psql.read_sql(sql,con)
-Missing value handling and categorical variables(letter)Conversion process
df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median()) df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())
df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0) df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)
-Divided into training data and test data(This time, 90% is used as learning data)
X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]] Y = df.loc[:, "CUST_VALID"] (train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)
-Decision tree model creation, verification with test data, confirmation of accuracy
clf = DecisionTreeClassifier(random_state=0) clf = clf.fit (train_X, train_y) #Create a model! pred = clf.predict(test_X) print("Prediction Score: {}".format(clf.score(test_X, test_y)))
-Random forest model creation and validation with test data
from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=0) clf = clf.fit(train_X, train_y) pred = clf.predict(test_X)
-Whole machine learning code
#### **`ml4py.py`**
```py
import pandas.io.sql as psql
import cx_Oracle
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (roc_curve, auc, accuracy_score)
from sklearn.metrics import classification_report
con = cx_Oracle.connect('scott/<password>@orcl_medium')
sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers'
df = psql.read_sql(sql,con)
# Missing value processing
df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median())
df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())
# Conversion of categorical variables
df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0)
df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)
X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]]
Y = df.loc[:, "CUST_VALID"]
(train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)
clf = DecisionTreeClassifier(random_state=0)
clf = clf.fit (train_X, train_y) #Create a model!
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
accuracy_score(pred, test_y)
classification_report(pred, test_y, labels=None)
print("Prediction Score: {}".format(clf.score(test_X, test_y)))
print ("Decision tree model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
clf = clf.fit(train_X, train_y)
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
print ("Random forest model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))
con.close()
##Summary and for the future Always Free environment(Compute instance and Autonomous DB)Machine learning with Jupyter Notebook using(Decision tree / random forest)Was able to be carried out.
Dated December 5, 2019The ability to analyze machine learning, geographic data, and graph data in Oracle Database is no longer a paid option and is now available in Standard Edition. Oracle Machine Learning API for Python iscoming soonBecause it means We would like to be able to compare when Oralce Machine Learning for Python is released.
###Reference information