Continuing from the last time, I will compare the generalized linear model (GLM) and the multi-layer perceptron.
" Generalized linear model (GLM) and neural network are together (1) "
The neural network and perceptron are summarized below in a very easy-to-understand manner, so please refer to them.
" 3rd Simple Perceptron · Levelfour / machine-learning-2014 Wiki · GitHub " " 3rd Multilayer Perceptron · Levelfour / machine-learning-2014 Wiki · GitHub "
With a simple perceptron, the parameters of the discriminant function converge only when they are linearly separable, that is, when the data can be separated by a straight line. If you give data that is not linearly separable, it is cute because you can continue to ask for parameters.
If you put the training data as [tex: x_1, x_2,…, xi…] and the coupling coefficient as [tex: w_1, w_2,…, wi…], The simple perceptron is expressed by a mathematical formula as follows.
z = \sum_{i=0}^n w_i x_i = W^{\mathrm{T}}X
However, the world is not all simple events that can be separated only by straight lines. After thinking about how to handle linearly inseparable events, we decided to multi-layer the hidden layer (intermediate layer) by processing it with a sigmoid function. It is a sigmoid function that seems to have come out suddenly, but the reason for using it here is ・ Because it is non-linear. If it is linear, even if the hierarchy is increased, the hierarchy can be compressed in the form of a linear sum. ・ Because it is a monotonous increase that is differentiable with all explanatory variables So, it seems that there is no use unless it is strictly this.
The j-th output element in the hidden layer can be expressed by a mathematical formula.
y = \frac{1}{1+exp(-\alpha (\sum_{i=0}^n w_{ji} x_{ji}))} = \frac{1}{1+exp(-\alpha W_j^{\mathrm{T}}X)} (2)
It will be.
In machine learning, after this, about "input layer-> hidden layer-> output layer" Estimate the coupling coefficient (parameter) while considering the discriminant function.
The generalized linear model (GLM) and the multi-layer perceptron have different uses, so of course there are parts where the procedure is different, and for a rough understanding this time, There are some parts that lack strictness, but if you compare equations (1) and (2), I think you can intuitively understand that you are doing something similar to "draw a line in the disjointed data".
Recommended Posts