In order to understand the essence of deep learning, it is important to implement it from scratch, but MNIST is difficult to implement CNN and it takes time to learn. So, this time, I used the Iris dataset to implement three-layer (one layer when counted in the middle layer) "non-deep" deep learning, that is, just a neural network, very easily. It's batch learning, not mini-batch, but it does include gradient descent and error backpropagation (although not probabilistically). Regarding the theory of deep learning, my favorite book [Deep Learning from scratch](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%] 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83 % 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0 % E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4- Please read% E5% BA% B7% E6% AF% 85 / dp / 4873117585). It's the best book that is really easy to understand.

Details

I'm not good at drawing diagrams on a computer, so I'm sorry for handwriting. I would like the main source. In addition, the Iris data is randomly sorted from the ones listed in English Wikipedia.

Source code

The source can be found on github. Python3.

Only the python code is posted here as well. Download the Iris data from github.

`iris.py`


# coding: utf-8

import numpy as np

#Hyperparameters
TRAIN_DATA_SIZE = 50  #TRAIN out of 150 data_DATA_SIZE is used as training data. The rest is used as teacher data.
HIDDEN_LAYER_SIZE = 6  #Middle layer(Hidden layer)Size(This time the middle layer is one layer, so scalar)
LEARNING_RATE = 0.1  #Learning rate
ITERS_NUM = 1000  #Number of repetitions

#Read data
#By default'#'Is designed to skip the line
x = np.loadtxt('iris.tsv', delimiter='\t', usecols=(0, 1, 2, 3))
raw_t = np.loadtxt('iris.tsv', dtype=int, delimiter='\t', usecols=(4,))
onehot_t = np.zeros([150, 3])
for i in range(150):
    onehot_t[i][raw_t[i]] = 1

train_x = x[:TRAIN_DATA_SIZE]
train_t = onehot_t[:TRAIN_DATA_SIZE]
test_x = x[TRAIN_DATA_SIZE:]
test_t = onehot_t[TRAIN_DATA_SIZE:]

#Weight / bias initialization
W1 = np.random.randn(4, HIDDEN_LAYER_SIZE) * np.sqrt(2 / 4)  #Initial value of He(Use this for ReLU)
W2 = np.random.randn(HIDDEN_LAYER_SIZE, 3) * np.sqrt(2 / HIDDEN_LAYER_SIZE)
b1 = np.zeros(HIDDEN_LAYER_SIZE)  #Initial value zero * I don't know the reason because I saw Deep Learning made from zero.
b2 = np.zeros(3)

#ReLU function
def relu(x):
    return np.maximum(x, 0)

#Softmax function * I don't know how to implement this function because I saw the net only.
def softmax(x):
    e = np.exp(x - np.max(x))
    if e.ndim == 1:
        return e / np.sum(e, axis=0)
    elif e.ndim == 2:
        return e / np.array([np.sum(e, axis=1)]).T
    else:
        raise ValueError

#Cross entropy error
def cross_entropy_error(y, t):
    if y.shape != t.shape:
        raise ValueError
    if y.ndim == 1:
        return - (t * np.log(y)).sum()
    elif y.ndim == 2:
        return - (t * np.log(y)).sum() / y.shape[0]
    else:
        raise ValueError

#Forward propagation
def forward(x):
    global W1, W2, b1, b2
    return softmax(np.dot(relu(np.dot(x, W1) + b1), W2) + b2)

#Test data results
test_y = forward(test_x)
print((test_y.argmax(axis=1) == test_t.argmax(axis=1)).sum(), '/', 150 - TRAIN_DATA_SIZE)

#Learning loop
for i in range(ITERS_NUM):
    #Forward propagation with data storage
    y1 = np.dot(train_x, W1) + b1
    y2 = relu(y1)
    train_y = softmax(np.dot(y2, W2) + b2)

    #Loss function calculation
    L = cross_entropy_error(train_y, train_t)

    if i % 100 == 0:
        print(L)

    #Gradient calculation
    #Use the formula obtained from the calculation graph
    a1 = (train_y - train_t) / TRAIN_DATA_SIZE
    b2_gradient = a1.sum(axis=0)
    W2_gradient = np.dot(y2.T, a1)
    a2 = np.dot(a1, W2.T)
    a2[y1 <= 0.0] = 0
    b1_gradient = a2.sum(axis=0)
    W1_gradient = np.dot(train_x.T, a2)

    #Parameter update
    W1 = W1 - LEARNING_RATE * W1_gradient
    W2 = W2 - LEARNING_RATE * W2_gradient
    b1 = b1 - LEARNING_RATE * b1_gradient
    b2 = b2 - LEARNING_RATE * b2_gradient

#Result display

#L value of final training data
L = cross_entropy_error(forward(train_x), train_t)
print(L)

#Test data results
test_y = forward(test_x)
print((test_y.argmax(axis=1) == test_t.argmax(axis=1)).sum(), '/', 150 - TRAIN_DATA_SIZE)

I tried to implement deep learning that is not deep with only NumPy

Details

Source code

iris.py

`iris.py`