This time, in order to understand forward propagation and error back propagation, I will implement a neural network that actually works with scratch. The dataset uses MNIST.
A neural network with 28 x 28 = 784 input layers, 10 intermediate layers, and 2 output layers is used, the activation function / error function is sigmoid, and the optimization method is gradient descent.
The data set is obtained by extracting "1" and "7" from MNIST and performing binary classification.
First, when you calculate $ a ^ 0_0 $, there are 784 inputs from $ x ^ 0_0 $ to $ x ^ 0_ {783} $, each with weights $ w ^ 0_ {00} $ to $ w ^ 0_ Since {783,0} $ is hung,
Expressed as a matrix, all calculations from $ a ^ 0_0 $ to $ a ^ 0_9 $ can be easily represented.
Since $ x ^ 1_0 $ to $ x ^ 1_9 $ is the result of passing $ a ^ 0_0 $ to $ a ^ 0_9 $ through the activation function sigmoid,
Next, when you calculate $ a ^ 1_0 $, there are 10 inputs from $ x ^ 1_0 $ to $ x ^ 1_9 $, each with weights $ w ^ 1_ {00} $ to $ w ^ 1_ {90. } $ Is hung, so
As before, if $ a ^ 1_0 $ and $ a ^ 1_1 $ are represented by a matrix,
Finally, $ y ^ 0 $ and $ y ^ 1 $ are
In this way, forward propagation can be easily performed by inner product or addition of matrices.
First, update the weights and biases from the middle layer to the output layer.
The update expression for the weight w can be represented by $ w = w- \ eta * \ frac {\ partial E} {\ partial w} $. Here, $ \ eta $ is the learning rate, and $ \ frac {\ partial E} {\ partial w} $ is the error E differentiated by the weight w.
Let's give a concrete example of $ \ frac {\ partial E} {\ partial w} $ and express it with a general formula to implement it. First, the weights from the middle layer to the output layer.
Find $ \ frac {\ partial E ^ 0} {\ partial w ^ 1_ {00}} $ to update the weight $ w ^ 1_ {00} $. From the chain rule of differentiation Expressed as a general formula, k = 0 to 9, j = 0 to 1, This allows the weight $ w ^ 1_ {kj} $ to be updated. As for the bias, $ b ^ 1 $ is 1, so $ x ^ 1_k $ in the above formula just replaces 1. This allows the bias $ b ^ 1_j $ to be updated.
Next is the update of weights and biases from the input layer to the middle layer.
To update the weight $ w ^ 0_ {00} $, $ \ frac {\ partial E ^ 0} {\ partial w ^ 0_ {00}} $ and $ \ frac {\ partial E ^ 1} {\ You need to find partial w ^ 0_ {00}} $. From the chain rule of differentiation Expressed as a general formula, k = 0 to 783, j = 0 to 9, This allows the weight $ w ^ 0_ {kj} $ to be updated. As for the bias, $ b ^ 0 $ is 1, so $ x ^ 0_k $ in the above formula just replaces 1. This allows the bias $ b ^ 0_j $ to be updated.
Based on the general formula obtained earlier, implement the forward propagation and error back propagation parts.
#Sigmoid function
def sigmoid(a):
return 1 / (1 + np.exp(-a))
#Differentiation of sigmoid function
def sigmoid_d(a):
return (1 - sigmoid(a)) * sigmoid(a)
#Backpropagation of error
def back(l, j):
if l == max_layer - 1:
return (y[j] - t[j]) * sigmoid_d(A[l][j])
else:
output = 0
m = A[l+1].shape[0]
for i in range(m):
output += back(l+1, i) * W[l+1][i,j] * sigmoid_d(A[l][j])
return output
The specific movement of def back (l, j):
is
When l = 1
,
(y [j] -t [j]) * sigmoid_d (A [1] [j])
is returned.
When l = 0
,
(y[0]-t[0])*sigmoid_d(A[1][0])*W[1][0,j]*sigmoid_d(A[0][j]) +(y[1]-t[1])*sigmoid_d(A[1][1])*W[1][1,j]*sigmoid_d(A[0][j])
Is returned.
#Weight W setting
np.random.seed(seed=7)
w0 = np.random.normal(0.0, 1.0, (10, 784))
w1 = np.random.normal(0.0, 1.0, (2, 10))
W = [w0, w1]
#Bias b setting
b0 = np.ones((10, 1))
b1 = np.ones((2, 1))
B = [b0, b1]
#Other settings
max_layer = 2 #Setting the number of layers
n = 0.5 #Learning rate setting
Set the weight W, bias b, and other settings.
Each term of the weight matrix w0 and w1 is a random number that follows a normal distribution of 0 to 1 so that learning can start smoothly. By the way, if you change the seed = number of np.random.seed (seed = 7)
, the starting condition of learning (whether it starts smoothly or is a little sluggish) will change. Each term of the bias matrices b0 and b1 is 1.
#Learning loop
count = 0
acc = []
for x, t in zip(xs, ts):
#Forward propagation
x0 = x.flatten().reshape(784, 1)
a0 = W[0].dot(x0) + B[0]
x1 = sigmoid(a0)
a1 = W[1].dot(x1) + B[1]
y = sigmoid(a1)
#X for parameter update,List a
X = [x0, x1]
A = [a0, a1]
#Parameter update
for l in range(len(X)):
for j in range(W[l].shape[0]):
for k in range(W[l].shape[1]):
W[l][j, k] = W[l][j, k] - n * back(l, j) * X[l][k]
B[l][j] = B[l][j] - n * back(l, j)
It's a learning loop. Forward propagation can be easily performed by inner product of matrices, addition, etc. In parameter update,
When l = 0
, the range is j = 0 to 9, k = 0 to 783
.
W[0][j,k] = W[0][j,k] - n * back(0,j) * X[0][k]
B[0][j] = B[0][j] - n * back(0,j)
Will be updated.
When l = 1
, the range is j = 0 to 1, k = 0 to 9
.
W[1][j,k] = W[1][j,k] - n * back(0,j) * X[0][k]
B[1][j] = B[1][j] - n * back(0,j)
Will be updated.
Read the MNIST dataset with Keras and extract only "1" and "7".
import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
import matplotlib.pyplot as plt
#Numeric display
def show_mnist(x):
fig = plt.figure(figsize=(7, 7))
for i in range(100):
ax = fig.add_subplot(10, 10, i+1, xticks=[], yticks=[])
ax.imshow(x[i].reshape((28, 28)), cmap='gray')
plt.show()
#Data set reading
(x_train, y_train), (x_test, y_test) = mnist.load_data()
show_mnist(x_train)
# 1,Extract 7
x_data, y_data = [], []
for i in range(len(x_train)):
if y_train[i] == 1 or y_train[i] == 7:
x_data.append(x_train[i])
if y_train[i] == 1:
y_data.append(0)
if y_train[i] == 7:
y_data.append(1)
show_mnist(x_data)
#Convert from list format to numpy format
x_data = np.array(x_data)
y_data = np.array(y_data)
# x_data normalization, y_One-hot representation of data
x_data = x_data.astype('float32')/255
y_data = np_utils.to_categorical(y_data)
#Learn, get test data
xs = x_data[0:200]
ts = y_data[0:200]
xt = x_data[2000:3000]
tt = y_data[2000:3000]
The data from 0 to 9 and the data from which 1 and 7 are extracted are displayed from the beginning.
Prepare 200 pieces of training data xs and ts, and 1,000 pieces of test data xt and tt.
It is the whole implementation that adds accuracy confirmation by test data and accuracy transition graph display for each learning.
import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
#Data set reading
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 1,Extract only the number 7
x_data, y_data = [], []
for i in range(len(x_train)):
if y_train[i] == 1 or y_train[i] == 7:
x_data.append(x_train[i])
if y_train[i] == 1:
y_data.append(0)
if y_train[i] == 7:
y_data.append(1)
#Convert from list format to numpy format
x_data = np.array(x_data)
y_data = np.array(y_data)
# x_data normalization, y_One-hot data
x_data = x_data.astype('float32')/255
y_data = np_utils.to_categorical(y_data)
#Acquisition of training data and test data
xs = x_data[0:200]
ts = y_data[0:200]
xt = x_data[2000:3000]
tt = y_data[2000:3000]
#Sigmoid function
def sigmoid(a):
return 1 / (1 + np.exp(-a))
#Differentiation of sigmoid function
def sigmoid_d(a):
return (1 - sigmoid(a)) * sigmoid(a)
#Backpropagation of error
def back(l, j):
if l == max_layer - 1:
return (y[j] - t[j]) * sigmoid_d(A[l][j])
else:
output = 0
m = A[l+1].shape[0]
for i in range(m):
output += back(l + 1, i) * W[l + 1][i, j] * sigmoid_d(A[l][j])
return output
#Weight W setting
np.random.seed(seed=7)
w0 = np.random.normal(0.0, 1.0, (10, 784))
w1 = np.random.normal(0.0, 1.0, (2, 10))
W = [w0, w1]
#Bias b setting
b0 = np.ones((10, 1))
b1 = np.ones((2, 1))
B = [b0, b1]
#Other settings
max_layer = 2 #Setting the number of layers
n = 0.5 #Learning rate setting
#Learning loop
count = 0
acc = []
for x, t in zip(xs, ts):
#Forward propagation
x0 = x.flatten().reshape(784, 1)
a0 = W[0].dot(x0) + B[0]
x1 = sigmoid(a0)
a1 = W[1].dot(x1) + B[1]
y = sigmoid(a1)
#X for parameter update,List a
X = [x0, x1]
A = [a0, a1]
#Parameter update
for l in range(len(X)):
for j in range(W[l].shape[0]):
for k in range(W[l].shape[1]):
W[l][j, k] = W[l][j, k] - n * back(l, j) * X[l][k]
B[l][j] = B[l][j] - n * back(l, j)
#Accuracy check by test data
correct, error = 0, 0
for i in range(1000):
#Inference with learned parameters
x0 = xt[i].flatten().reshape(784, 1)
a0 = W[0].dot(x0) + B[0]
x1 = sigmoid(a0)
a1 = W[1].dot(x1) + B[1]
y = sigmoid(a1)
if np.argmax(y) == np.argmax(tt[i]):
correct += 1
else:
error += 1
calc = correct/(correct+error)
acc.append(calc)
count +=1
print("\r[%s] acc: %s"%(count, calc))
#Accuracy transition graph display
import matplotlib.pyplot as plt
plt.plot(acc, label='acc')
plt.legend()
plt.show()
In 200 steps, the classification accuracy was 97.8%. It would be great if the neural network that I implemented by scratch works properly.
Recommended Posts