Generalized linear model (GLM) and neural network are the same (2)

Continuing from the last time, I will compare the generalized linear model (GLM) and the multi-layer perceptron.

" Generalized linear model (GLM) and neural network are together (1) "

From a machine learning perspective: Multilayer Perceptron

The neural network and perceptron are summarized below in a very easy-to-understand manner, so please refer to them.

" 3rd Simple Perceptron · Levelfour / machine-learning-2014 Wiki · GitHub " " 3rd Multilayer Perceptron · Levelfour / machine-learning-2014 Wiki · GitHub "

With a simple perceptron, the parameters of the discriminant function converge only when they are linearly separable, that is, when the data can be separated by a straight line. If you give data that is not linearly separable, it is cute because you can continue to ask for parameters.

If you put the training data as [tex: x_1, x_2,…, xi…] and the coupling coefficient as [tex: w_1, w_2,…, wi…], The simple perceptron is expressed by a mathematical formula as follows.

  z = \sum_{i=0}^n w_i x_i =　W^{\mathrm{T}}X

However, the world is not all simple events that can be separated only by straight lines. After thinking about how to handle linearly inseparable events, we decided to multi-layer the hidden layer (intermediate layer) by processing it with a sigmoid function. It is a sigmoid function that seems to have come out suddenly, but the reason for using it here is ・ Because it is non-linear. If it is linear, even if the hierarchy is increased, the hierarchy can be compressed in the form of a linear sum. ・ Because it is a monotonous increase that is differentiable with all explanatory variables So, it seems that there is no use unless it is strictly this.

The j-th output element in the hidden layer can be expressed by a mathematical formula.

  y = \frac{1}{1+exp(-\alpha (\sum_{i=0}^n w_{ji} x_{ji}))} = \frac{1}{1+exp(-\alpha W_j^{\mathrm{T}}X)}     (2)

It will be.

In machine learning, after this, about "input layer-> hidden layer-> output layer" Estimate the coupling coefficient (parameter) while considering the discriminant function.

It ’s the same after all

The generalized linear model (GLM) and the multi-layer perceptron have different uses, so of course there are parts where the procedure is different, and for a rough understanding this time, There are some parts that lack strictness, but if you compare equations (1) and (2), I think you can intuitively understand that you are doing something similar to "draw a line in the disjointed data".

Α in the right-hand side denominator, which is the difference between equations (1) and (2), is called gain, and it affects the gradient near 0. I'm sorry I don't know why this is stuck, please let me know.