It's basically a memorandum of Deep Learning content made from O'Reilly's zero. I have a python tag, but I don't see much python code.
To put it simply, a logic circuit
A gate that operates as follows
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
Change the weight (w_1, w_2) $ of $ b + w_1x_1 + w_2x_2 to realize the above gate Same for other gates
XOR gate cannot be reproduced with 1-layer perceptron
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
I won't go into detail, but this cannot be achieved with a single perceptron. This can be achieved by stacking layers (multilayer perceptron) This is the basis of neural networks
The basic idea is with Perceptron
Perceptron output 1 if the output of the function mentioned earlier exceeds 0, otherwise it outputs 0. In other words, the output of one layer is output to the next layer using a step function. The step function here is called the ** activation function **.
This activation function replaces in neural networks
h(x)=\frac{1}{1+exp(-x)}
A function is a function that returns some output when a certain input is given. The sigmoid function is also just a function
Very simply, the shape of the function is smooth
I haven't done it yet
If the input is $ X $, the weight of the first layer is $ W $, and the bias of the first layer is $ B $, The weighted sum of the first layer can be expressed as $ A = XW + B $. Give this $ A $ vector as the input of the sigmoid function and give its output as the next input The output of the last layer uses another function instead of the sigmoid function
def sigmoid(x):
return 1/(1+np.exp(-x))
def init_network():
network = {}
network['W1'] = np.array([[0.1, 0.3, 0.5],[0.2, 0.4, 0.6]])
network['B1'] = np.array([0.1, 0.2, 0.3])
# ...Initialize the other layers as well
return network
def forward(network, x):
W1 = network['W1']
B1 = network['B1']
a1 = np.dot(x, W1)
z1 = sigmoid(a1)
# ...Continue
y = identity_function(a3)
return y
#I think it will be object-oriented in the future
network = init_network()
x = np.array([1.0, 2.0])
y = forward(network, x)
Use identity function or softmax function The identity function is a function that outputs the input as it is The softmax function is expressed by the following formula
y_k = \frac{exp(a_k)}{\sum_{i=1}^{n}exp(a_i)}
However, when implementing the softmax function programmatically, you have to be careful about overflow, so Subtract the maximum value of the vector from the value of a.
It converts the output of the neural network into the probability of classifying it into a certain class. However, since the magnitude relationship of the output does not change, it is generally omitted in the inference phase. The softmax function is used in the learning phase
Recommended Posts