"Deep Learning from scratch" Self-study memo (Part 8) I drew the graph in Chapter 6 with matplotlib

While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 7 ← → Part 9

Chapter 5: Even if you read the error reverse propagation method, honestly, it seems that you do not understand

However, this method makes the calculation of the gradient very fast, I understand the advantages of modularizing and implementing as a "layer".

From P162, a learning program using the error back propagation method is listed, but in order to execute this, the programs with various definitions listed on P142 and later are also required.

In Chapter 6, what do you do with this on the way to this point? He explained to me what I thought ...

Just because you explain it doesn't mean you can understand it.

In such a case, you have to try various things because you can either hold down what you understand and move on, or whatever the contents of the book are.

For the time being, I drew a diagram of P169.

6-1.jpg

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

def function_2(x, y):
    return x**2/20 + y**2

# x,Coordinate range of y
x = np.arange(-10.0, 10.0, 0.1)
y = np.arange(-10.0, 10.0, 0.1)
# x,Lattice data of y
X, Y = np.meshgrid(x, y)
#Set the value of the defined function
Z = function_2(X, Y)

#Add figure
fig = plt.figure(figsize=(10.0, 8.0))
#Create a 3D axis
ax = fig.add_subplot(111, projection='3d')
#Set axis label
ax.set_title("Figure 6-1 f(x,y)=x**2/20+y**2", size = 14)
ax.set_xlabel("x", size = 14)
ax.set_ylabel("y", size = 14)
ax.set_zlabel("f(x, y)", size = 14)
#Set axis scale
ax.set_xticks([-10.0, -5.0, 0.0, 5.0, 10.0])
ax.set_yticks([-10.0, -5.0, 0.0, 5.0, 10.0])
ax.set_zticks([0.0, 20.0, 40.0, 60.0, 80.0, 100.0])
#drawing
ax.plot_wireframe(X, Y, Z)
#ax.plot_surface(X, Y, Z, rstride=1, cstride=1)
#ax.contour3D(X,Y,Z)
#ax.contourf3D(X,Y,Z)
#ax.scatter3D(np.ravel(X),np.ravel(Y),np.ravel(Z))
plt.show()

If you change the plot_wireframe, the plot will be different.

While researching various things, I found something like this. You can rotate the drawn graph and see it from various directions.

import numpy as np
import matplotlib
#It seems that the backend of matplotlib is set, but I don't know what it means.
#However, after adding this line, the graph will open in a new window.
matplotlib.use('TkAgg')

#for plotting
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def function_2(x, y):
    return x**2/20 + y**2

x = np.arange(-10.0, 10.0, 0.1)
y = np.arange(-10.0, 10.0, 0.1)

X, Y = np.meshgrid(x, y)
Z = function_2(X, Y)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(X, Y, Z, cmap='bwr', linewidth=0)
fig.colorbar(surf)
ax.set_title("Surface Plot")
fig.show()

6-2.jpg

The color of the graph seems to be specified by the parameter of cmap. matplotlib color example code

The contour lines look like this

6-3.jpg

import matplotlib.pyplot as plt
import numpy as np

def function_2(x, y):
    return x**2/20 + y**2

x = np.arange(-10.0, 10.0, 0.1)
y = np.arange(-10.0, 10.0, 0.1)
h = np.arange(0., 100.0, 1.0)
X, Y = np.meshgrid(x, y)
Z = function_2(X, Y)

plt.figure()
plt.contour(X, Y, Z, levels=h)
plt.xlim([-10, 10])
plt.ylim([-10, 10])
plt.show()

contour (position on x-axis, position on y-axis, height on coordinates, levels = [specify height to plot line]) Since the increment of the value of the array x and y is set to 0.1, the line is smooth, but it takes time to display. If you set this to 1.0, it will be displayed immediately, but the lines are uneven. For h, specify the height at which you want to draw a line. In the example, a line is drawn one by one from 0 to 100.

Gradient vector diagram

6-4.jpg

import matplotlib.pyplot as plt
import numpy as np

def _numerical_gradient_no_batch(f, x):
    h = 1e-4  # 0.0001
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x)  # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x)  # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val  #Restore the value
    return grad

def numerical_gradient(f, X):
    if X.ndim == 1:
        return _numerical_gradient_no_batch(f, X)
    else:
        grad = np.zeros_like(X)
        for idx, x in enumerate(X):
            grad[idx] = _numerical_gradient_no_batch(f, x)
        return grad

def function_2(x):
    return (x[0]**2/20+x[1]**2)

x = np.arange(-10.0, 10.0, 1.)
y = np.arange(-10.0, 10.0, 1.)
h = np.arange(0., 100.0, 10.0)
X, Y = np.meshgrid(x, y)

X = X.flatten()
Y = Y.flatten()
grad = numerical_gradient(function_2, np.array([X, Y]).T).T
    
plt.figure()
plt.quiver(X, Y, -grad[0], -grad[1],  angles="xy",color="#666666")
plt.xlim([-10, 10])
plt.ylim([-5, 5])
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.draw()
plt.show()

I just changed function_2 in gradient_2d.py in folder ch04. quiver (position on x-axis, position on y-axis, gradient in x-axis, gradient in y-axis)

Diagram of optimization update path

6-5.jpg

import matplotlib.pyplot as plt
import numpy as np

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val #Restore the value
        it.iternext()   
        
    return grad

def  adagrad(x, lr, grad, v, moment):
    v += grad * grad
    x -= lr * grad / (np.sqrt(v) + 1e-7)
    return x, v

def  momentum(x, lr, grad, v, moment):
    v = moment*v - lr*grad
    x += v
    return x, v

def sgd(x, lr, grad, v = None, moment = None):
    x -= lr * grad
    return x, v

def gradient_descent(opt, f, init_x, lr=0.01, step_num=100, moment=0.9):
    x = init_x
    x_history = []
    v = 0
    for i in range(step_num):
        x_history.append( x.copy() )
        grad = numerical_gradient(f, x)
        x, v = opt(x, lr, grad, v, moment)
    return np.array(x_history)

def function_1(x, y):
    return x**2/20 + y**2

def function_2(x):
    return (x[0]**2/20+x[1]**2)

x = np.arange(-10.0, 10.0, 0.1)
y = np.arange(-10.0, 10.0, 0.1)
h = np.arange(0., 10.0, 1.0)
X, Y = np.meshgrid(x, y)
Z = function_1(X, Y)

plt.figure()
plt.contour(X, Y, Z, levels=h)

init_x = np.array([-7.0, 2.0])
x_history = gradient_descent(sgd, function_2, init_x, lr=0.9, step_num=100)
#x_history = gradient_descent(momentum, function_2, init_x, lr=0.2, step_num=20, moment=0.9)
#x_history = gradient_descent(adagrad, function_2, init_x, lr=0.9, step_num=100)

plt.plot(x_history[:,0], x_history[:,1],'-ro')
plt.xlim([-10, 10])
plt.ylim([-10, 10])
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.show()

In the case of SGD, if the learning coefficient lr is not set properly, it will not be zigzag as in the example. If it is 1.0, it will be zigzag, but it will not converge to 0. If it is 0.7 or less, it will converge to 0 before the zigzag becomes noticeable. 0.9 is the most suitable graph.

In the case of momentum, if you do not adjust the value of moment as well as the learning coefficient lr, it will not look like the example in the book.

Optimization update path diagram 3D

w6-9.jpg

import numpy as np
import matplotlib
matplotlib.use('TkAgg')

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)

    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)

        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)

        x[idx] = tmp_val #Restore the value
        it.iternext()   

    return grad

def  adagrad(x, lr, grad, v, moment):
    v += grad * grad
    x -= lr * grad / (np.sqrt(v) + 1e-7)
    return x, v

def  momentum(x, lr, grad, v, moment):
    v = moment*v - lr*grad
    x += v
    return x, v

def sgd(x, lr, grad, v = None, moment = None):
    x -= lr * grad
    return x, v

def gradient_descent(opt, f, init_x, lr=0.01, step_num=100, moment=0.9):
    x = init_x
    x_history = []
    v = 0
    for i in range(step_num):
        w = x.tolist()
        z = f(x)
        w.append(z)
        x_history.append( w )
        grad = numerical_gradient(f, x)
        x, v = opt(x, lr, grad, v, moment)
    return np.array(x_history)

def function_1(x, y):
    return x**2/20 + y**2

def function_2(x):
    return (x[0]**2/20+x[1]**2)


#for plotting
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

x = np.arange(-8.0, 8.0, .1)
y = np.arange(-4.0, 4.0, .1)

X, Y = np.meshgrid(x, y)
Z = function_1(X, Y)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(X, Y, Z, cmap='bwr', linewidth=0)

init_x = np.array([-7.0, 2.0])
x_history = gradient_descent(sgd, function_2, init_x, lr=0.9, step_num=100)
#x_history = gradient_descent(momentum, function_2, init_x, lr=0.2, step_num=20, moment=0.9)
#x_history = gradient_descent(adagrad, function_2, init_x, lr=0.9, step_num=100)
ax.plot(x_history[:,0], x_history[:,1], x_history[:,2],'-ro')

fig.colorbar(surf)
ax.set_title("Surface Plot")
fig.show()

There is an array copy

In the definition of gradient_descent x_history.append( x.copy() ) It has become. This means "make a duplicate of the same content as x and add it to x_history". x_history.append( x ) Written, it means "add the memory location referenced by the name x to x_history", and when the contents of x are rewritten, the contents of x_history are also rewritten. The same thing happens with the assignment a = x. This seems to be "something" in a python array and is explained in various places.

nditer Iterator?

It seems that there is a function called np.nditer (x, flags = ['multi_index'], op_flags = ['readwrite']) in the definition of numerical_gradient, and then it is used to control the loop. If you don't know, print out the contents of the loop and check it.

x = np.array([[-7.0, 2.0],[-6., 1.],[-5., 0.]])
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
    idx = it.multi_index
    print("x[" + str(idx) + "] : " + str(x[idx]))
    it.iternext()  

x[(0, 0)] : -7.0 x[(0, 1)] : 2.0 x[(1, 0)] : -6.0 x[(1, 1)] : 1.0 x[(2, 0)] : -5.0 x[(2, 1)] : 0.0

I see. Now, change the input a little

x = np.array([[-7.0, 2.0,-6.],[1., -5., 0.]])
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
    idx = it.multi_index
    print("x[" + str(idx) + "] : " + str(x[idx]))
    it.iternext()   

x[(0, 0)] : -7.0 x[(0, 1)] : 2.0 x[(0, 2)] : -6.0 x[(1, 0)] : 1.0 x[(1, 1)] : -5.0 x[(1, 2)] : 0.0

Even if the number of elements and dimensions of x change, it can be processed without changing the code of the program.

This is the end of Chapter 6, Section 1. I was just playing with drawing graphs, but I learned about arrays and python grammar. I was able to understand what the gradient was in and how it was drawn on the graph.

Referenced site

A very summary of matplotlib Python 3: How to write a 3D graph mplot3d tutorial matplotlib color example code matplotlib axes.plot

Part 7 ← → Part 9

Unreadable Glossary

Recommended Posts

"Deep Learning from scratch" Self-study memo (Part 8) I drew the graph in Chapter 6 with matplotlib
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
"Deep Learning from scratch" Self-study memo (No. 14) Run the program in Chapter 4 on Google Colaboratory
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
[Deep Learning from scratch] I implemented the Affine layer
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
"Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial
Deep Learning / Deep Learning from Zero 2 Chapter 4 Memo
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo
A memo when executing the deep learning sample code created from scratch with Google Colaboratory
"Deep Learning from scratch" self-study memo (No. 13) Try using Google Colaboratory
"Deep Learning from scratch" Self-study memo (No. 10-2) Initial value of weight
"Deep Learning from scratch" in Haskell (unfinished)
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
I got the date from the pub rice in Kagawa and drew a graph
Prepare the environment for O'Reilly's book "Deep Learning from scratch" with apt-get (Debian 8)
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
Lua version Deep Learning from scratch Part 5.5 [Making pkl files available in Lua Torch]
Deep Learning from scratch
Deep Learning from the mathematical basics Part 2 (during attendance)
[Deep Learning from scratch] I tried to explain Dropout
Python vs Ruby "Deep Learning from scratch" Chapter 3 Graph of step function, sigmoid function, ReLU function
Python vs Ruby "Deep Learning from scratch" Chapter 1 Graph of sin and cos functions
[Part 4] Use Deep Learning to forecast the weather from weather images
Deep Learning from scratch 1-3 chapters
[Part 1] Use Deep Learning to forecast the weather from weather images
[Part 3] Use Deep Learning to forecast the weather from weather images
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Create an environment for "Deep Learning from scratch" with Docker
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
[Part 2] Use Deep Learning to forecast the weather from weather images
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4