It's a very different concept from Chapter 2 Perceptron.
--Input layer --Output layer --Intermediate layer (hidden layer)
From the input layer to the output layer, we will call them the 0th layer, the 1st layer, and the 2nd layer in order.
From the review of Chapter 2, it can be expressed by the following formula
y = h(b+w1x1+w2x2)
h(x) = 0 (x <= 0)
1 (x > 0)
a = b+w1x1+w2x2
h(a) = 0 (x <= 0)
1 (x > 0)
h(x) = 1 (x >= 0)
0 (x < 0)
When trying to express a step function as an expression
Since it cannot be used like step_function (np.array ([1.0, 2.0])),
import numpy as np
x = np.array([-1.0, 1.0, 2.0])
y = x > 0
y
array([False, True, True], dtype=bool)
y = y.astype(np.int)
y
array([0, 1, 1])
Summarizing the above into a function
def step_function(x):
return np.array(x > 0, dtype= np.int)
Can be represented by.
h(x) = 1 / 1 + exp(-x)
def sigmoid(x):
return 1/ (1 + np.exp(-x))
x = np.array([-1.0, 1.0, 2.0])
sigmoid(x)
> array([0.2689, 0.73.., 0.88..])
The perceptron has only 0,1 signals, but the NN has continuous signals. Do not use linear functions for the activation function. Using linear algebra makes it meaningless to deepen layers in neural networks.
Because it becomes a network without hidden layers, it is not possible to take advantage of multi-layering
Although sigmoid is common, a function called ReLU (Rectified Linear Unit) is mainly used. The ReLU function can be expressed as a mathematical expression as follows.
h(x) = x (x > 0)
0 (x <= 0)
Implementation
def relu(x):
return np.maximum(0, x)
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
np.dot(A, B)
>> array([[19, 22],[43, 50]])
A = np.array([1,2,3],[4,5,6])
B = np.array([[1,2],[3,4],[5,6]])
np.dot(A,B)
>> array([[22,28],[49,64]])
X = np.array([1,2])
W = np.array([1,3,5],[2,4,6])
Y = np.dot(X,W)
print(Y)
>>> [5,11,17]
To express a three-layer neural network in a simple formula
A = XW = B
W1.shape = (2, 3)
W2.shape = (3, 2)
W3.shape = (2, 2)
A1 = np.dot(X, W1) + B1
Z1 = sigmoid(A1)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid(A2)
A3 = np.dot(Z2, W3) + B3
Implementation summary
def init_network():
network = {}
network['W1'] = np.array([[0.1, 0.3, 0.5],[0.2,0.4,0.6]])
network['b1'] = np.array([0.1, 0.2, 0.3])
network['W2'] = np.array([[0.1, 0.4],[0.2, 0.5],[0.3,0.6]])
network['b2'] = np.array([0.1, 0.2])
network['W3'] = np.array([[0.1, 0.3],[0.2,0.4]])
network['b3'] = np.array([0.1, 0.2])
return network
def forward(network, x)
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']
a1 = np.dot(x, W1) + b1
z1 = sigmoid(a1)
a2 = np.dot(z1, W2) + b2
z2 = sigmoid(a2)
a3 = np.dot(z2, W3) + b3
y = identity_function(a3)
return y
network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
print(y) # [0.3168.., 0.69....]
The identity function outputs the input as is. The softmax function is expressed by the following formula
yk = exp(ak) / nΣi=1 exp(ai)
a = np.array([0.3, 2.9, 4.0])
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a = sum_exp_a
You have to be careful about overflow.
def softmax(a):
c = np.max(a)
exp_a = np.exp(a-c) #Overflow measures
sum_exp_a = np.sum(exp_a)
y = exp_a / sum_exp_a
return y
For 10-class classification problems, set the number of output layers to 10.
Image set of handwritten numbers. One of the most famous datasets. It consists of numerical images from 0 to 9.
60,000 training images and 10,000 test images are available, and these images are used for learning and inference.
A common use of the MNIST dataset is to train with training images and measure how well the trained model can classify the test images.
import sys, os
sys.path.append(os.pardir)
from dataset.mnist import load_mnist
(x_train, t_train), (x_test, t_test) = \
load_mnist(flatten=True, normalize=False)
print(x_train.shape) # (60000, 784)
print(t_train.shape) # (60000,)
print(x_test.shape) # (10000, 784)
print(t_test.shape) # (10000,)
Returns the read MNIST data in the format (training image, training label), (test image, test label).
import sys, os
sys.path.append(os.pardir)
import numpy as np
from datase.mnist import load_mnist
from PIL import Image
def img_show(img):
pil_img = IMage.fromarray(np.unit8(img))
pil_img.show()
(x_train, t_train), (x_test, t_test) = \
load_mnist(flatten = True, normalize = False)
img = x_train[0]
label = t_train[0]
print(label) # 5
print(img.shape) # (784)
img = img.reshape(28, 28)
print(img.shape) # (28, 28)
img_show(img) #5 images are displayed
The image read as flatten = True is stored in one dimension as a Numpy array. When displayed, it must be reshaped to a size of 28 x 28.
Since it is classified into 10 numbers, there are 10 output layers. It is also assumed that there are two hidden layers, the first hidden layer has 50 neurons and the second layer has 100 neurons. The numbers 50 and 100 can be set to any value. First, we define three functions.
def get_date():
(x_train, t_train), (x_test, t_test) = \
load_mnist(normalize=True, flatten=True, one_hot_label=False)
return x_test, t_test
def init_network():
with open("sample_weight.pkl", 'rb') as f:
network = pickle.load(f)
return network
def predict(network, x):
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']
a1 = np.dot(x, W1) + b1
z1 = sigmoid(a1)
a2 = np.dot(z1, W2) + b2
z2 = sigmoid(a2)
a3 = np.dot(z2, W3) + b3
y = softmax(a3)
return y
x, t = get_date()
network = init_network()
accuracy_cnt = 0
for i in range(len(x)):
y = predict(network, x[i])
p = np.armax(y) #Get index of the most established element
if p == t[i]:
accurancy_cnt += 1
print("accurancy:" + str(float(accuracy_cnt) / len(x)))
Load the trained weight parameters stored in sample_weight.pkl. This file contains the weight and bias parameters. pkl will be explained in the next chapter.
Recommended Posts