Enjoy Coursera / Machine Learning materials twice

Lately, I am taking Coursera's Maechine Learning (Stanford University, by Andrew Ng). Of course, the explanation video is also useful, but what makes this course so good is the programming task that is given every time. It's just Matlab code, so I wanted to rewrite it to Python so that I could review it later. (Other people can see examples of the same efforts on the Internet.)

Here, I will introduce how to enjoy the teaching materials of "Coursera / Machine Learning" twice or three times (in Python). We also explain ** scipy.optimize.minimize () ** used for function minimization.


Cost function and its derivative

Many Machine Learning materials create a cost function and search for parameters that minimize it. Matlab uses fminunc () (or fmincg () provided in the task), but in Python ** scipy.optimize.minimize () ** supports a similar function.

First, prepare the cost function and its derivative by yourself. In the first task, the linear model, it is given by the following equation.


J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_{\theta}(x ^{(i)} ) -y ^{(i)}) ^2
\\
\frac{\partial J}{\partial \theta _j} = \frac{1}{m} \sum_{i=1}^{m} 
(h_{\theta} (x^{(i)}) - y^{(i)} )x_j^{(i)}

$ h_ {\ theta} $ is the function assumed in modeling. (This time, a linear model is assumed.)

def compute_cost(theta, xmat, ymat):
    m = len(ymat)
    
    one_v = np.ones(len(xmat))
    xmat1 = np.column_stack((one_v, xmat)) # shape: m x 2
    ymat1 = ymat.reshape(len(ymat),1)      # shape: [m,] --> [m,1]
    theta1 = theta.reshape(len(theta),1)
    hx2y = np.dot(xmat1, theta1) - ymat1    # shape: m x 1
    j =  1. /2. /m * np.dot(hx2y.T, hx2y)  # shape: 1 x 1
    
    return j
	
def compute_grad(theta, xmat, ymat):
    j_grad = np.zeros(len(theta), dtype=float)

    m = len(ymat)
    one_v = np.ones(len(xmat))
    xmat1 = np.column_stack((one_v, xmat))
    ymat1 = ymat.reshape(m,1)      # shape: [m,] --> [m,1]
    theta1 = theta.reshape(len(theta),1)
    hx2y = np.dot(xmat1, theta1) - ymat1
    j_grad = 1. /m * np.dot(xmat1.T, hx2y)
    j_grad = j_grad.flatten()      
    
    return j_grad

It is an implementation of such a function. Using these two functions as subordinate functions, the parameter (theta) search that minimizes the cost is performed by the gradient descent method. First, the following formula shown in Coursera is coded as it is and the calculation is performed.

Gradient descent equation:

\theta_i := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_{\theta} (x^{(i)}) - y^{(i)}) x_{j}^{(i)}  \ \ \ \ \ (simultaneously\  update\  \theta_j\  for\  all\  j)


def gradient_descent(theta, alpha, num_iters):
    global xmat, ymat

    m = len(ymat)      # number of train examples
    j_history = np.zeros(num_iters)
    #
    for iter in range(num_iters):
        xmat1 = xmat ; ymat1 = ymat
        delta_t = alpha * compute_grad(theta, xmat1, ymat1)
        theta = theta - delta_t
        # for DEBUG #     
        j_history[iter] = compute_cost(theta, xmat, ymat)

    return theta, j_history

The simplest gradient descent method is as described above. Coursera / Machine Learning This is no problem for the first task. The found Fitting line is as follows. ex1_plot_fitted.png

However, considering the convergence of larger-scale data and complex data analysis and the correspondence to local minimum points, there may be cases where you want to use a module with a proven track record.

Try using scipy.optimize.minimize ()

Various functions are implemented in Scipy for optimization.

http://docs.scipy.org/doc/scipy/reference/optimize.html

This time, I decided to use ** scipy.optimize.minimize () **, which has the function of "minimizing the function that refers to the parameter". Excerpt from the documentation.

scipy.optimize.minimize (fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

parameters

It lists calculation methods that I haven't seen much, and it is encouraging even though I am not familiar with the contents. First, I tested it with a simple mathematical formula.

import scipy.optimize as spo
def fun1(x):
	ans = (x[0] - 1.5) ** 2 + (x[1] - 2.5) ** 2
	
	return ans

def fun1_grad(x):
	dx = np.zeros(2)
	dx[0] = 2 * x[0] - 3
	dx[1] = 2 * x[1] - 5
	
	return dx

x_init = np.array([1.0, 1.0])   # initial value

res1 = spo.minimize(fun1, x_init, method='BFGS', jac=fun1_grad, \
            options={'gtol': 1e-6, 'disp': True})

This gives us $ x = [1.5, 2.5] $ which minimizes $ f = (x_0 --1.5) ^ 2 + (x_1 --2.5) ^ 2 $. For Coursera's tasks, we did the following. First, prepare a wrapper for the cost function and derivative.

def compute_cost_sp(theta):
    global xmat, ymat
    j = compute_cost(theta, xmat, ymat)
    
    return j

def compute_grad_sp(theta):
    global xmat, ymat
    j_grad = compute_grad(theta, xmat, ymat)
    
    return j_grad

The purpose of this is to clarify that the parameter to be searched by omitting the arguments (xmat, ymat) is "theta". Call scipy.optimize.minimize () by passing a function or derivative that takes only "theta" as an argument. (I'm supposed to use global variables ...)

    theta_ini = np.zeros(2)
    res1 = spo.minimize(compute_cost_sp, theta_ini, method='BFGS', \
         jac=compute_grad_sp, options={'gtol': 1e-6, 'disp': True})
    print ' theta.processed =', res1.x

Actually, it is possible to pass the necessary data (xmat, ymat) with ** args ** without writing a wrapper, but I was conscious of clarifying the "adjustment parameters" here. The calculation results are as follows.

#Your own Gradient Descent(iteration =2000 times)
theta =     -3.788,      1.182
# scipy.optimize.minimize()Result of
Optimization terminated successfully.
         Current function value: 4.476971
         Iterations: 4
         Function evaluations: 6
         Gradient evaluations: 6
 theta.processed = [-3.89578088  1.19303364]

Try to be stochastic

From here, it is an advanced version that is not included in the task (it was done arbitrarily). I tried "trial" of Stochastic Gradient Descent, which is often used in deep learning.

def stochastic_g_d(theta, alpha, sample_ratio, num_iters):
    # calculate the number of sampling data (original size * sample_ratio)
    sample_num = int(len(ymat) * sample_ratio)

    m = len(ymat)      # number of train examples (total)
    n = np.size(xmat) / len(xmat)
    j_history = np.zeros(num_iters)

    for iter in range(num_iters):
        xs = np.zeros((sample_num, n)) ; ys = np.zeros(sample_num)
 
        for i in range(sample_num):
            myrnd = np.random.randint(m)
            xs[i,:] = xmat[myrnd,]
            ys[i]   = ymat[myrnd]

        delta_t = alpha * compute_grad(theta, xs, ys)
        theta = theta - delta_t
        j_history[iter] = compute_cost(theta, xmat, ymat)
    
    return theta, j_history

In the above code, the process from sampling to minimization of Train Data was done by my own code, but of course it is also possible to find the solution with scipy.optimize.minimize () after sampling Train Data.

For a detailed explanation of S.G.D. (Stochastic Gradient Descent), refer to the reference (web article). The "liver" is to sample the data used to calculate the cost function and derivative. Since the number of train data this time is small, less than 100, there is no need to use S.G.D., but in the case of handling large-scale data, the calculation efficiency can be greatly improved.

The following is a drawing of the Learning Curve. (Horizontal axis: number of loop calculations, vertical axis: cost, normal gradient descent method and stochastic gradient descent method.) Gradient Descent Method (reference) ex1_plot_learning_c1.png

Stochastic Gradient Descent (sampling/whole data= 0.3) ex1_plot_learning_c2.png

It's fun to be able to expand the scope by yourself while doing tasks with Cousera. I'm still in the middle of taking Machine Learning this time, but what should I take next to Machine Learning? I'm thinking about various things. Recently, the number of paid courses "Specialization" has increased, but some of the free courses are also of interest. (It takes a lot of time, but ...)

References

Recommended Posts

Enjoy Coursera / Machine Learning materials twice
How to enjoy Coursera / Machine Learning (Week 10)
Learn collaborative filtering along with Coursera Machine Learning materials
[Memo] Machine learning
Machine learning classification
Machine Learning sample
Beginning of machine learning (recommended teaching materials / information)
Machine learning tutorial summary
About machine learning overfitting
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Machine learning support vector machine
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Cross-entropy to review in Coursera Machine Learning week 2 assignments
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Machine learning model considering maintainability
Machine learning learned with Pokemon
Data set for machine learning
Japanese preprocessing for machine learning
An introduction to machine learning
Machine learning / classification related techniques
Machine Learning: Supervised --Linear Regression
Basics of Machine Learning (Notes)
[Machine learning] Understanding random forest
Machine learning with Python! Preparation
Machine Learning Study Resource Notepad
Machine learning ② Naive Bayes Summary
Machine learning article summary (self-authored)
About machine learning mixed matrices
Machine Learning: Supervised --Random Forest
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Machine learning environment construction macbook 2021
Build a machine learning environment
Machine learning algorithm (simple perceptron)
Used in machine learning EDA
Importance of machine learning datasets
Machine learning and mathematical optimization
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
I implemented Extreme learning machine
Beginning with Python machine learning
Machine learning algorithm (support vector machine)
Super introduction to machine learning
4 [/] Four Arithmetic by Machine Learning
Machine learning ④ K-nearest neighbor Summary
Pokemon machine learning Nth decoction
Try machine learning with Kaggle
Machine learning stacking template (regression)
Machine Learning: Supervised --Decision Tree