[Reference] [Deep Learning from scratch](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82%89%E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3 % 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90 % 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7% E6% AF % 85 / dp / 4873117585)
The activation function is responsible for determining how the sum of the input signals is activated. This serves as an arrangement for the values passed to the next layer.
Typically, In the activation function of "simple perceptron", "step function" etc. are used, In the activation function of "multilayer perceptron (neural network)", "sigmoid function, softmax function" and identity function are used. Also, while these "step function, sigmoid function, softmax function" are called non-linear functions, functions such as "y = cx" are called linear functions. In general, neural networks do not use linear functions. The reason for this is "[Why not use linear functions in Multilayer Perceptron?](Https://qiita.com/namitop/items/d3d5091c7d0ab669195f#%E3%81%AA%E3%81%9C%E5%A4%" 9A% E5% B1% A4% E3% 83% 91% E3% 83% BC% E3% 82% BB% E3% 83% 97% E3% 83% 88% E3% 83% AD% E3% 83% B3% E3% 81% A7% E7% B7% 9A% E5% BD% A2% E9% 96% A2% E6% 95% B0% E3% 82% 92% E4% BD% BF% E3% 82% 8F% E3% 81% AA% E3% 81% 84% E3% 81% AE% E3% 81% 8B) ”. In general, softmax and identity functions are used in the output layer.
The step function is a function that switches the output at the threshold value. Also called a "step function". A function commonly used in simple perceptrons (single-layer networks).
As an example, a step function that becomes 0 when the entered value is 0 or less and 1 when it is greater than 0 The implementation and operation are shown below.
def step_function(x):
if x>0:
return 1
else:
return 0
# step_function(4) => 1
# step_function(-3) => 0
The graph can be implemented as follows.
step_function is implemented so that it can handle array [np.array ()
] as an argument.
The behavior is the same except for handling arrays.
import numpy as np
import matplotlib.pylab as plt
def step_function(x):
y = x > 0
return y.astype(np.int)
x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)
plt.show()
The formula is as follows.
h(x) = \frac{1}{1+e^{-x}}\\
e^{-x}Is np in numpy.exp(-x)Can be written.
The larger the value you enter, the closer to 1 The smaller the value you enter, the closer it is to 0. It can be said that the output of the step function does not kill too much the value of the original input compared to being 0 or 1.
The implementation and operation are as follows.
import numpy as np
def sigmoid(x):
return 1 / (1+np.exp(-x))
x = np.array([-3.0, 1.0, 4.0])
# sigmoid(x) => [ 0.04742587 0.73105858 0.98201379]
The graph looks like this:
import numpy as np
import matplotlib.pylab as plt
def sigmoid(x):
return 1 / (1+np.exp(-x))
x = np.arange(-5.0, 5.0, 0.1)
y = sigmoid(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)
plt.show()
When the input value is 0 or less, it becomes 0, and when it is larger than 1, the input is output as it is. The graph will look like the one below.
import numpy as np
import matplotlib.pylab as plt
def relu(x):
return np.maximum(0, x)
x = np.arange(-5.0, 5.0, 0.1)
y = relu(x)
plt.plot(x, y)
plt.show()
A function used in the output layer. This function always returns the same value as the input value. [Question] Mathematics: What is an identity function
Commonly used in regression problems.
import numpy as np
import matplotlib.pylab as plt
def koutou(a):
return a
x = np.arange(-5, 5, 0.1)
y = koutou(x)
plt.plot(x, y)
plt.show()
A function used in the output layer. Commonly used in classification problems.
import numpy as np
import matplotlib.pylab as plt
def softmax(a):
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a / sum_exp_a
return y
x = np.arange(-5, 5, 0.1)
y = softmax(x)
plt.plot(x, y)
plt.show()
A linear function is a function whose output is a constant multiple of the input.
For example, y = cx (c is a constant)
. If you make a graph, it will be a straight line.
Non-linear functions are those that are not linear functions.
The reason is that using a linear function as the activation function makes it meaningless to deepen the layer in a neural network.
The problem with linear functions is due to the fact that no matter how deep the layers are, there is always a "network without hidden layers" that does the same thing.
As a simple example, if the linear function h (x) = cx
is used as the activation function and three layers are stacked, it becomesy = h (h (h (x))
. That is, this isy = c *. It can be expressed even in a network without a hidden layer, such as c * c * x (y = ax)
.
If you use a linear function as the activation function as in this example, you cannot take advantage of the multiple layers. In other words, you should not use linear functions for the activation function, as it may lead to unnecessary processing.
Regression problems are problems that make (continuous) numerical predictions from some input data. (Example: The problem of predicting the weight of a person from an image of the person)
The classification problem is the question of which class the data belongs to. (Example: A problem that classifies whether a person is male or female from an image of a person)
Recommended Posts