This article is the 5th day article of "Money Forward Advent Calendar 2015". Sorry for being late

Overview

Touch on vectorization and how to write it in machine learning

Target

Machine learning beginner
People who have no background in the procession and have not used it for a while

What not to do

Description of matrices, dot products, transposes, etc.

Intro I feel that the excitement in this field is accelerating rapidly, with IT giants such as Googe, Microsoft, and IBM opening source machine learning systems. In my opinion (maybe hope), the third AI boom has taken root before winter, and we are now at the stage of being conscious of creating services that "use machine learning to make it natural + differentiate it." Isn't it?

So, I've been devoting my last few months of input activities exclusively to taking MOOCs and reading MLP series.

What I felt in that was also in The shortest route for working programmers who have avoided mathematics to start studying machine learning. At the very least, if you are not accustomed to handling matrices and vectors, even if you manipulate matrices in the course, it is easy to get into a state of what you are doing now (quote). " I specialized in control engineering and mechatronics at university, so I had some background, but I still struggled (excuse me because I had a four-year blank). I had a hard time myself, so it's tough for people in the same position or with no background! From a slightly above perspective, I would like to write about machine learning using vectors and matrices with an example.

Why use vectors / matrices

It is simply because the execution (learning) speed is fast. If you don't use a matrix, you will use for-loop to train each sample, but when n> 1000, the learning speed will drop dramatically. This is because Octave and Python, which are interpreted languages, incur overhead for each for minute. Therefore, it is recommended to learn by matrix rather than for-loop.

For reference, I will introduce the performance when I tried to implement MNIST data (28x28 pixels) learning of MLP (Multilayer Perceptron) model in both matrix version and for statement version.

Training dataset: 14000 samples
Features: 764
Number of hidden layer units: 100
epoc: 100
Execution environment
Learning time required for matrix version: 38.1sec
Learning time for for sentence: (Experience) 10min
Honestly, there may be a mistake in the implementation, but I have not verified it because debugging is difficult.

How do you vectorize it?

Consider a simple logistic regression. This time, let's vectorize the calculus of z and the gradient.

State z vectorization

The vector z (each element) before plunging into the activation function can be calculated as follows. The above is first implemented with a for statement, but the goal is to implement it in a form using a matrix.

That is, we want to make z into the following form with just one command.

x-> Input sample
theta-> Parameters of each data
m-> Number of datasets
n-> dimension

And each of the above columns can be transformed as follows. Please recognize it as ** such thing **.

For details, I would like you to go to "Inner product", but in the vector inner product Whichever comes first, it will be the addition for each element (theta0 * x0 + theta1 * x1 +…).

As mentioned above, each element in z is the inner product of the vector x and the vector theta. To express this with x and theta without using a for statement, create an X with each vector x (transposed) superimposed on a row as shown below.

Then you can create a simple formula like the one below.

Implementation

If you implement this in Octave / python, you can get a neat shape as below.

`octave.m`


z = X * theta;

`python.py`


#With the method using numpy
np.dot(theta, X)

Example 2 Gradient vectorization

In order to find the optimum parameter, we need to implement the partial differential formula because we want to find a value that makes the partial differential for each parameter of the evaluation function 0 or below the threshold.

\frac{\partial J}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}) x_{j}^{(i)}

When this is vectorized

Now, to transform this, you can first use the following rules. Please recognize it as ** such thing **.

The matrix in which x is lined up is the transpose of X introduced in the vectorization of z, so

Therefore, you can write a vectorized partial derivative of the cost function as shown below.

Implementation

I will briefly call it as follows.

`octave.m`


h = activate_function(z)
grad = 1 / m * (X'*(h-y))

`python.py`


h = activate_function(z)
grad = 1/m * np.dot(X.T, h-y)

Summary

The above is the explanation about vectorization. I would like to make a splint for the hearts of beginners with this. However, I don't feel like I'm making a mistake, so I'd love to hear your opinions and comments.

Books I read / courses I took

Deep learning
Data Scientist Training Reader / Introduction to Machine Learning
An introduction to machine learning for IT engineers
Building Machine Learning Systems with Python
Professor Andrew Ng's Machine Learning
Washington State University Machine Learning

Digression

I want to know where beginners stumble
I want kobito to be able to describe latex (urgently)
Qiita Maybe I can't wait in line ...?
I had no choice but to gyazo, but there must be a good way to do it.
The person who made the textbook is amazing
When I was making this, my heart was about to break.

[For beginners] Introduction to vectorization in machine learning

Overview

Target

What not to do

Why use vectors / matrices

How do you vectorize it?

State z vectorization

Implementation

octave.m

python.py

Example 2 Gradient vectorization

Implementation

octave.m

python.py

Summary

Books I read / courses I took

Digression

`octave.m`

`python.py`

`octave.m`

`python.py`