This article is the 5th day article of "Money Forward Advent Calendar 2015". Sorry for being late
Intro I feel that the excitement in this field is accelerating rapidly, with IT giants such as Googe, Microsoft, and IBM opening source machine learning systems. In my opinion (maybe hope), the third AI boom has taken root before winter, and we are now at the stage of being conscious of creating services that "use machine learning to make it natural + differentiate it." Isn't it?
So, I've been devoting my last few months of input activities exclusively to taking MOOCs and reading MLP series.
What I felt in that was also in The shortest route for working programmers who have avoided mathematics to start studying machine learning. At the very least, if you are not accustomed to handling matrices and vectors, even if you manipulate matrices in the course, it is easy to get into a state of what you are doing now (quote). " I specialized in control engineering and mechatronics at university, so I had some background, but I still struggled (excuse me because I had a four-year blank). I had a hard time myself, so it's tough for people in the same position or with no background! From a slightly above perspective, I would like to write about machine learning using vectors and matrices with an example.
It is simply because the execution (learning) speed is fast. If you don't use a matrix, you will use for-loop to train each sample, but when n> 1000, the learning speed will drop dramatically. This is because Octave and Python, which are interpreted languages, incur overhead for each for minute. Therefore, it is recommended to learn by matrix rather than for-loop.
For reference, I will introduce the performance when I tried to implement MNIST data (28x28 pixels) learning of MLP (Multilayer Perceptron) model in both matrix version and for statement version.
Consider a simple logistic regression. This time, let's vectorize the calculus of z and the gradient.
The vector z (each element) before plunging into the activation function can be calculated as follows. The above is first implemented with a for statement, but the goal is to implement it in a form using a matrix.
That is, we want to make z into the following form with just one command.
And each of the above columns can be transformed as follows. Please recognize it as ** such thing **.
As mentioned above, each element in z is the inner product of the vector x and the vector theta. To express this with x and theta without using a for statement, create an X with each vector x (transposed) superimposed on a row as shown below.
Then you can create a simple formula like the one below.
If you implement this in Octave / python, you can get a neat shape as below.
octave.m
z = X * theta;
python.py
#With the method using numpy
np.dot(theta, X)
In order to find the optimum parameter, we need to implement the partial differential formula because we want to find a value that makes the partial differential for each parameter of the evaluation function 0 or below the threshold.
When this is vectorized
Now, to transform this, you can first use the following rules. Please recognize it as ** such thing **.
The matrix in which x is lined up is the transpose of X introduced in the vectorization of z, so
Therefore, you can write a vectorized partial derivative of the cost function as shown below.
I will briefly call it as follows.
octave.m
h = activate_function(z)
grad = 1 / m * (X'*(h-y))
python.py
h = activate_function(z)
grad = 1/m * np.dot(X.T, h-y)
The above is the explanation about vectorization. I would like to make a splint for the hearts of beginners with this. However, I don't feel like I'm making a mistake, so I'd love to hear your opinions and comments.
Recommended Posts