I had a chance to scratch the Multilayer Perceptron in Python, so I'll leave it.
The following is a sample when exclusive OR (XOR) is trained with a multi-layer perceptron. If you are unlucky, your learning will not converge, but if you repeat it, you will find that it is done properly.
The image of Perceptron taken up this time is like this.
This is the actual program.
perceptron.py
import numpy as np
#Vectorized(Processed for each element of the list)Sigmoid function
@np.vectorize
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
# [0,0,0]Or[[0,0,0]]A list of the form[[0],[0],[0]]Function to convert to
def verticalize(row):
return np.reshape(row, (1, len(row)))
#Learning rate
rho = 1
#Input data
#This data exhausts all input patterns
x = np.array([[0, 0, -1], [0, 1, -1], [1, 0, -1], [1, 1, -1]])
#Teacher data
# 0,The output of 1 corresponds to the index
#Where 1 stands is the meaning of the correct label
y = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
#Randomly determine the weight
w1 = np.random.randn(3, 2)
w2 = np.random.randn(3, 2)
#This time, we prepare two output layer neurons and reduce it to a two-class classification problem of whether the output is 0 or 1.
#This allows general multi-layer perceptron calculations to be applied.
#If you repeat it 50,000 times, it seems that it will converge enough, so let's learn this number of times
for i in range(50000):
# x(Input data)Is m*n matrix.
#Each row represents one piece of data and each column represents a feature
#Take out one line at a time and update the weight(Online learning)
for p in range(len(x)):
#Change x vertically for matrix calculation
# [[x1], [x2], [b1]]Feeling like
xp = verticalize(x[p])
yp = y[p]
#Matrix product of input data and weights from input layer to hidden layer passed through sigmoid function
#This result is the output value of each neuron in the hidden layer.
g1 = sigmoid(xp @ w1)
#Add a bias term to the above result and arrange vertically
# [[h1],[h2],[b2]]Feeling like
#Bias term is always-Output 1
g1 = verticalize(np.hstack((g1[0], [-1])))
#Hidden layer output+Bias output and matrix product of weights from output layer to hidden layer through sigmoid function
#This result is the output value of each neuron in the output layer.
g2 = sigmoid(g1 @ w2)
#Calculate the error of the weight from the hidden layer to the output layer by the error back propagation method
eps_out = (g2 - yp) * g2 * (1 - g2)
#Calculate the error of the weight from the hidden layer to the input layer by the error back propagation method
#The bias term is mixed in the calculation, so delete it.
eps_hidden = np.delete(np.sum(eps_out*w2, axis=1)*g1*(1 - g1), -1, 1)
#Weight update
w2 -= rho * g1.T @ eps_out
w1 -= rho * xp.T @ eps_hidden
#Check the result(Forecast)
#Calculate only the forward and see the output value
for p in range(len(x)):
xp = verticalize(x[p])
yp = y[p]
g1 = sigmoid(xp @ w1)
g1 = verticalize(np.hstack((g1[0], [-1])))
g2 = sigmoid(g1 @ w2)
#Check the output value of the output layer
print(g2[0])
#Output of the index with the highest value in the output layer
#I learned to classify based on the index, so this can determine the class
print(np.argmax(g2))
The points are the vectorized sigmoid function and the @ operator, which is equivalent to np.dot. Now you can implement it neatly without nesting.
Also, if you change the weight layer appropriately, I think you can get versatility.
I think it looks pretty good.
If you omit the comment, it looks like this.
perceptron.py
import numpy as np
@np.vectorize
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
def verticalize(row):
return np.reshape(row, (1, len(row)))
rho = 1
x = np.array([[0, 0, -1], [0, 1, -1], [1, 0, -1], [1, 1, -1]])
y = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
w1 = np.random.randn(3, 2)
w2 = np.random.randn(3, 2)
for i in range(50000):
for p in range(len(x)):
xp = verticalize(x[p])
yp = y[p]
g1 = sigmoid(xp @ w1)
g1 = verticalize(np.hstack((g1[0], [-1])))
g2 = sigmoid(g1 @ w2)
eps_out = (g2 - yp) * g2 * (1 - g2)
eps_hidden = np.delete(np.sum(eps_out*w2, axis=1)*g1*(1 - g1), -1, 1)
w2 -= rho * g1.T @ eps_out
w1 -= rho * xp.T @ eps_hidden
for p in range(len(x)):
xp = verticalize(x[p])
yp = y[p]
g1 = sigmoid(xp @ w1)
g1 = verticalize(np.hstack((g1[0], [-1])))
g2 = sigmoid(g1 @ w2)
print(g2[0])
print(np.argmax(g2))
Use scikit-learn! (I can't say that because it's a university issue)
I also wrote something that was implemented as a class with increased versatility, so I will add it! Your multi-layer perceptron is dirty