This article is from December 17th Article of "Oracle Cloud Infrastructure Advent Calendar 2019" It is written as 2019-12-17).

What you are doing in this article

--Create VCN / Compute instance / Autonomous Database (ADW) in Always Free environment of Oracle Cloud Infrastructure --Set up a Jupyter Notebook environment on your Compute instance --Machine learning using data from Autonomous Database (decision tree / random forest)

Environment construction procedure

Create a virtual cloud network
Create a Compute instance
Create Autonmous Database (ADW this time)![Image01.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/5a0120bc-05c4-6f3a- e99a-3e873a044f1e.jpeg)
Set up a Jupyter Notebook environment on your Compute instance
Connect to Autonomous Database from Jupyter Notebook and perform machine learning

1. Create a virtual cloud network

Create a Virtual Cloud Network (VCN) in the OCI Web Console Virtual networking quickstart has been added since December 5, 2019 You can easily create a VCN with four input items: "VCN name", "VCN CIDR", "public subnet CIDR", and "private subnet CIDR". (When creating a VCN using Quickstart, if you create it in Japanese display, the subnet name will be something like "Public subnet- " It is recommended to change the display language to English)

Always Free environment can be created in the home region

2. Create a Compute instance

Create a Compute instance with the image of Oracle Linux 7 --Click "View Shape, Network and Storage Options" and in the networking configuration, 1. Select the virtual cloud network and public subnet created in --In addition, check "Assign public IP address" and create

3. Create an Autonmous Database (ADW this time)

Create a workload type from Create Autonomous Database by selecting Data Warehouse

Create a working user using SQL Developer Web etc.

CREATE USER scott IDENTIFIED BY password; 

GRANT DWROLE TO scott;
GRANT UNLIMITED TABLESPACE TO scott;

GRANT
    SELECT ANY TABLE,
    UPDATE ANY TABLE,
    DELETE ANY TABLE,
    INSERT ANY TABLE
TO scott;

4. Set up a Jupyter Notebook environment on your Compute instance

Connect to the Compute instance (Oracle Linux) created in 2. with ssh (opc user) and install Jupyter Notebook

4-1. Install Python 3 4-1-1. EPEL repository-Installation of Python3.6

sudo yum install -y oracle-epel-release-el7 oracle-release-el7
sudo yum install -y python36
sudo yum install -y libSM.x86_64
sudo yum install -y libXext.x86_64
sudo yum install -y libXrender.x86_64

4-1-2. mlevn (Python virtual environment setup and activation)

python3.6 -m venv mlenv 
source mlenv/bin/activate

4-1-3. Install the library used for machine learning (library for executing this procedure)

pip3 install --upgrade pip
pip3 install pandas
pip3 install seaborn
pip3 install sklearn

4-1-4. Install Jupyter

python3 -m pip install jupyter

4-1-5. Oracle Instant Client Installation and Configuration

sudo yum -y install oraclelinux-developer-release-el7
sudo yum -y install python-cx_Oracle
pip3 install cx_Oracle
sudo sh -c "echo /usr/lib/oracle/18.3/client64/lib > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo ldconfig

4-1-6. Download Autonomous Database Client Credentials (Wallet) From the details screen of the Autonomous Database created in 3., click "DB Connection", and from the "Database Connection" dialog, click "Download Wallet" to download the Zip file. Transfer the downloaded Zip file to a Compute instance (such as scp) and unzip it (note the unzipped directory Example: / home / opc / wallet) Edit the sqlnet.ora file in the unzipped directory Change "~ / network / admin" specified in DIRECTORY on the "WALLET_LOCATION" line to the unzipped directory

`sqlnet.ora`


WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/home/opc/wallet")))
SSL_SERVER_DN_MATCH=yes

4-1-7. Run the notebook Set environment variables (LD_LIBRARY_PATH is the path of the Instance Client, TNS_ADMIN is the directory where the wallet was unzipped)

export LD_LIBRARY_PATH=/usr/lib/oracle/18.3/client64/lib:$LD_LIBRARY_PATH
export TNS_ADMIN=/home/opc/wallet
source mlenv/bin/activate
jupyter notebook --ip=0.0.0.0 &

Make a note of the token key that appears when you run the jupyter notebook

    To access the notebook, open this file in a browser:
        file:///home/opc/.local/share/jupyter/runtime/nbserver-2210-open.html
    Or copy and paste one of these URLs:
        http://compute01:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc
     or http://127.0.0.1:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc     ```

5.Connect to Autonomous Database from Jupyter Notebook and perform machine learning

5-1.Establish tunneling in another ssh session from client PC

`$ ssh –i <private_key> opc@<public_IP_address> -L 8000:localhost:8888`


5-2.Open a web browser on your local machine and access the URL below
http://localhost:8000 
5-3.When prompted for a token, enter the token key that appears when you run jupyter Notebook
![image06.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/bbf7a0a4-64f7-a9a5-0a5f-1ed216b319c3.jpeg)

![image07.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b1949454-7e7c-4cd5-4d61-fc0e5c6bfb37.jpeg)

5-4.Connect to Autonomous Database from Jupyter Notebook
Autonomous Database username, password and tnsnames in the unzipped wallet.Check the connection service name from ora and specify

Example of connecting to Autonomous Database and checking DB version

import cx_Oracle con = cx_Oracle.connect('scott/@orcl_medium') print(con.version) con.close()


Example of connecting to Autonomous Database and checking the number of SALES tables in sh schema

import cx_Oracle con = cx_Oracle.connect('scott/@orcl_medium') cur = con.cursor() # opens cursor for statements to use cur.execute('select count(*) from sh.sales') for result in cur: # prints all data print(result) con.close()


CUST by gender from the CUSTOMERS table_Compare VALID

import pandas.io.sql as psql import cx_Oracle import seaborn as sns %matplotlib inline

con = cx_Oracle.connect('scott/@orcl_medium')

sql = 'select CUST_GENDER,CUST_VALID from sh.customers' df = psql.read_sql(sql,con) sns.countplot('CUST_GENDER',hue='CUST_VALID',data=df)

Column names are written in uppercase

con.close()


![image08.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b0fd0ce8-514a-d38e-c19e-ca4228e4cd5a.jpeg)



####Machine learning
scikit-Library tree and scikit in learn-Using the random forest class fier in learn ensemble
CUST in the CUSTOMERS table_Create a machine learning model that predicts VALID

-Get features and objective variables with SELECT statement and store in DataFrame

sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers' df = psql.read_sql(sql,con)


-Missing value handling and categorical variables(letter)Conversion process

Missing value processing

df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median()) df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())

Conversion of categorical variables

df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0) df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)


-Divided into training data and test data(This time, 90% is used as learning data)

X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]] Y = df.loc[:, "CUST_VALID"] (train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)


-Decision tree model creation, verification with test data, confirmation of accuracy

clf = DecisionTreeClassifier(random_state=0) clf = clf.fit (train_X, train_y) #Create a model! pred = clf.predict(test_X) print("Prediction Score: {}".format(clf.score(test_X, test_y)))


-Random forest model creation and validation with test data

from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=0) clf = clf.fit(train_X, train_y) pred = clf.predict(test_X)


-Whole machine learning code


#### **`ml4py.py`**
```py

import pandas.io.sql as psql
import cx_Oracle
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (roc_curve, auc, accuracy_score)
from sklearn.metrics import classification_report
con = cx_Oracle.connect('scott/<password>@orcl_medium')

sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers'
df = psql.read_sql(sql,con)
# Missing value processing
df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median())
df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())

# Conversion of categorical variables
df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0)
df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)

X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]]
Y = df.loc[:, "CUST_VALID"]
(train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)

clf = DecisionTreeClassifier(random_state=0)
 clf = clf.fit (train_X, train_y) #Create a model!
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
accuracy_score(pred, test_y)
classification_report(pred, test_y, labels=None)
print("Prediction Score: {}".format(clf.score(test_X, test_y)))
 print ("Decision tree model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
clf = clf.fit(train_X, train_y)
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
 print ("Random forest model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))
con.close()

##Summary and for the future Always Free environment(Compute instance and Autonomous DB)Machine learning with Jupyter Notebook using(Decision tree / random forest)Was able to be carried out.

Dated December 5, 2019The ability to analyze machine learning, geographic data, and graph data in Oracle Database is no longer a paid option and is now available in Standard Edition. Oracle Machine Learning API for Python iscoming soonBecause it means We would like to be able to compare when Oralce Machine Learning for Python is released.

###Reference information

Machine learning with Jupyter Notebook in OCI Always Free environment (2019/12/17)