Implemented in Python PRML Chapter 5 Neural Networks

Let's implement a neural network to reproduce Figure 5.3 from PRML Chapter 5. As I said earlier, although it is said to be an implementation, the figure reproduced from the code is not crisp. .. .. I feel like I'm angry if I don't raise imperfections, but please refer to it.

First, with regard to Figures 5.3 (b), (c), and (d), the impression is that the prediction accuracy is not as good as the figures in the PRML. Furthermore, with regard to Figure 5.3 (a), a completely misguided prediction is returned. I made a trial and error, but please point out if anyone notices a mistake due to lack of power.

I will leave the explanation of the neural network and the error propagation method (Backpropagation) itself to PRML and Hajipata, and I would like to briefly check only the parts necessary for implementation.

Rough flow of implementation

(1) The output via the neural network is represented by (5.9). The expression in the PRML statement assumes a sigmoid function for the activation function $ h () $, but note that tanh () is specified in Figure 5.3.

 y_k({\bf x}, {\bf w}) = \sigma(\sum_{j=0}^M, w^{(2)}_{kj} h(\sum_{i=0}^D w^{(1)}_{ji} x_i)) (5.9)

(2) When learning the weight $ {\ bf w} $ between nodes, find the difference between the output and the measured value at each node. First, the output of the hidden unit is (5.63), and the output of the output unit is (5.64).

 z_j = {\rm tanh} (\sigma(\sum_{i=0}^D, w^{(1)}_{ji} x_i)) (5.63)

 y_k = \sum_{j=0}^M, w^{(2)}_{kj} z_i (5.64)

(3) Next, find the error $ \ delta_k $ in the output layer.

\delta_k = y_k - t_k (5.65)

④ Next, find the error $ \ delta_j $ in the hidden layer.

\delta_j = (1-{z_j}^2) \sum_{k=1}^K w_{kj} \delta_k (5.65)

⑤ Update the weight between nodes using (5.43) and (5.67).

{\bf w}^{\rm \tau+1} = {\bf w}^{\rm \tau} - \mu \nabla  E({\bf{w}})(5.43)

code

import matplotlib.pyplot as plt
from pylab import *
import numpy as np
import random

def heaviside(x):
    return 0.5 * (np.sign(x) + 1)

def NN(x_train, t, n_imput, n_hidden, n_output, eta, W1, W2, n_loop):
    for n in xrange(n_loop):
        for n in range(len(x_train)):
            x = np.array([x_train[n]])
            
            #feedforward
            X = np.insert(x, 0, 1) #Insert fixed term

            A = np.dot(W1, X) #(5.62)
            Z = np.tanh(A)  #(5.63)
            Z[0] = 1.0
            Y = np.dot(W2, Z) #(5.64)

   
            #Backprobagation
            D2 = Y - t[n]#(5.65)
            D1 = (1-Z**2)*W2*D2 #(5.66)
    
            W1 = W1- eta*D1.T*X #(5.67), (5.43)
            W2 = W2- eta*D2.T*Z #(5.67), (5.43)
    return  W1, W2

def output(x, W1, W2):
    X = np.insert(x, 0, 1) #Insert fixed term
            
    A = np.dot(W1, X) #(5.62)
    Z = np.tanh(A)  #(5.63)
    Z[0] = 1.0 #Insert fixed term
    Y = np.dot(W2, Z) #(5.64)
    return Y, Z

if __name__ == "__main__":
    #Set form of nueral network 
    n_imput = 2
    n_hidden = 4
    n_output = 1
    eta = 0.1
    W1 = np.random.random((n_hidden, n_imput))
    W2 = np.random.random((n_output, n_hidden))
    n_loop = 1000
    
    
    #Set train data
    x_train = np.linspace(-4, 4, 300).reshape(300, 1)
    y_train_1 = x_train * x_train
    y_train_2 = np.sin(x_train)
    y_train_3 = np.abs(x_train)
    y_train_4 = heaviside(x_train)
    
    W1_1, W2_1= NN(x_train, y_train_1, n_imput, n_hidden, n_output, eta, W1, W2, n_loop) 
    W1_2, W2_2= NN(x_train, y_train_2, n_imput, n_hidden, n_output, eta, W1, W2, n_loop)
    W1_3, W2_3= NN(x_train, y_train_3, n_imput, n_hidden, n_output, eta, W1, W2, n_loop)
    W1_4, W2_4= NN(x_train, y_train_4, n_imput, n_hidden, n_output, eta, W1, W2, n_loop)

    Y_1 = np.zeros((len(x_train), n_output))
    Z_1 = np.zeros((len(x_train), n_hidden))

    Y_2 = np.zeros((len(x_train), n_output))
    Z_2 = np.zeros((len(x_train), n_hidden))

    Y_3 = np.zeros((len(x_train), n_output))
    Z_3 = np.zeros((len(x_train), n_hidden))

    Y_4 = np.zeros((len(x_train), n_output))
    Z_4 = np.zeros((len(x_train), n_hidden))

    for n in range(len(x_train)):
        Y_1[n], Z_1[n] =output(x_train[n], W1_1, W2_1)
        Y_2[n], Z_2[n] =output(x_train[n], W1_2, W2_2)
        Y_3[n], Z_3[n] =output(x_train[n], W1_3, W2_3)
        Y_4[n], Z_4[n] =output(x_train[n], W1_4, W2_4)
    
    
    plt.plot(x_train, Y_1, "r-")
    plt.plot(x_train, y_train_1, "bo", markersize=3)
    for i in range(n_hidden):
        plt.plot(x_train, Z_1[:,i], 'm--')
    xlim([-1,1])
    ylim([0, 1])
    title("Figure 5.3(a)")
    show()
    
    plt.plot(x_train, Y_2, "r-")
    plt.plot(x_train, y_train_2, "bo", markersize=2)
    for i in range(n_hidden):
        plt.plot(x_train, Z_2[:,i], 'm--')
    xlim([-3.14,3.14])
    ylim([-1, 1])
    title("Figure 5.3(b)")
    show()
    
    
    plt.plot(x_train, Y_3, "r-")
    plt.plot(x_train, y_train_3, "bo", markersize=4)
    for i in range(n_hidden):
        plt.plot(x_train, Z_3[:,i], 'm--')
    xlim([-1,1])
    ylim([0, 1])
    title("Figure 5.3(c)")
    show()
    
    
    plt.plot(x_train, Y_4, "r-")
    plt.plot(x_train, y_train_4, "bo" ,markersize=2)
    for i in range(n_hidden):
        plt.plot(x_train, Z_4[:,i], 'm--')
    xlim([-2,2])
    ylim([-0.05, 1.05])
    title("Figure 5.3(d)")
    show()

result

Screen Shot 2015-09-26 at 03.19.37.png

Screen Shot 2015-09-26 at 03.19.59.png

Screen Shot 2015-09-26 at 03.20.17.png

Screen Shot 2015-09-26 at 03.20.35.png