Introduction

In "Deep Learning from scratch", the stance of implementing the contents of Tensorflow with Numpy is taken. In this article, we'll follow suit and consider the problem of implementing a simple classification problem in Numpy.

Reference book * -Deep Learning from scratch If you have any experience with machine learning, this book is a must. You should think that you can't understand without reading this book. The number of pages depends on this book.

problem

Suppose we want to classify some data (4 samples) into 3 distinct classes: 0, 1, and 2. We have set up a network with a pre-activation output z in the last layer. Applying softmax will give the final model output. input X ---> some network --> z --> y_model = softmax(z)

We quantify the agreement between truth (y) and model using categorical cross-entropy. J = - sum_i (y_i * log(y_model(x_i))

In the following you are to implement softmax and categorical cross-entropy and evaluate them values given the values for z.

We will use y_cl = np.array ([0, 0, 2, 1]) as the input value, and will not consider the hidden layer in detail this time. That is, suppose that arrayy_cl becomes another arrayz when this input value goes out of the hidden layer after some calculation. When looking at the output value through the softmax function using z, does it give the same array as y_cl? Please also discuss its accuracy.

Since the implementation of Tensorflow has already been done as follows, please implement the program that behaves the same only with Numpy. (I think that section 3.5 of the reference book will be helpful.) There are problems from 1) to 5).


from __future__ import print_function
import numpy as np
import tensorflow as tf


# Data: 4 samples with the following class labels (input features X irrelevant here)
y_cl = np.array([0, 0, 2, 1])

# output of the last network layer before applying softmax
z = np.array([
    [  4,   5,   1],
    [ -1,  -2,  -3],
    [0.1, 0.2, 0.3],
    [ -1, 100,   1]
    ])



# TensorFlow implementation as reference. Make sure you get the same results!
print('\nTensorFlow ------------------------------ ')
with tf.Session() as sess:
    z_ = tf.constant(z, dtype='float64')
    y_ = tf.placeholder(dtype='float64', shape=(None,3))

    y = np.array([[1., 0., 0.], [1., 0., 0.], [0., 0., 1.], [0., 1., 0.]])
    print('one-hot encoding of data labels')
    print(y)

    y_model = tf.nn.softmax(z)
    crossentropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_model), reduction_indices=[1]))

    print('softmax(z)')
    print(sess.run(y_model))

    print('cross entropy = %f' % sess.run(crossentropy, feed_dict={y_: y}))


print('\nMy solution ------------------------------ ')
# 1) Write a function that turns any class labels y_cl into one-hot encodings y. (2 points)
#    0 --> (1, 0, 0)
#    1 --> (0, 1, 0)
#    2 --> (0, 0, 1)
#    Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).

def to_onehot(y_cl, num_classes):
    y = np.zeros((len(y_cl), num_classes))
    return y


# 2) Write a function that returns the softmax of the input z along the last axis. (2 points)
def softmax(z):
    return None


# 3) Compute the categorical cross-entropy between data and model (2 points)


# 4) Which classes are predicted by the model (maximum entry). (1 point)


# 5) How many samples are correctly classified (accuracy)? (1 point)

answer

from __future__ import print_function
import numpy as np
import tensorflow as tf


# Data: 4 samples with the following class labels (input features X irrelevant here)
y_cl = np.array([0, 0, 2, 1])

# output of the last network layer before applying softmax
z = np.array([
    [  4,   5,   1],
    [ -1,  -2,  -3],
    [0.1, 0.2, 0.3],
    [ -1, 100,   1]
    ])

#The Tensorflow part is omitted

print('\n☆My solution ------------------------------ ')
# 1) Write a function that turns any class labels y_cl into one-hot encodings y. (2 points)
#    0 --> (1, 0, 0)
#    1 --> (0, 1, 0)
#    2 --> (0, 0, 1)
#    Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).

def to_onehot(num_classes, y_cl):
    y_one = np.eye(num_classes)[y_cl]
    return y_one

print('one-hot encoding of data labels by Numpy')
y_one = (to_onehot(3,y_cl)).astype(np.float32)
print(y_one)

#2) Write a function that returns the softmax of the input z along the last axis. (2 points)
def softmax(z):
    e = np.exp(z)
    dist = e / np.sum(e, axis=1, keepdims=True)
    return dist

print('softmax(z) by Numpy')
y_my = softmax(z)
print(y_my)

# 3) Compute the categorical cross-entropy between data and model (2 points)
crossentropy_my = np.mean(-np.sum(y_one*np.log(y_my),axis=1))
print('cross entropy by Numpy: %f' % crossentropy_my)

# 4) Which classes are predicted by the model (maximum entry). (1 point)
print('The predicted class by Numpy:')
y_pre_cl= np.argmax(y_my,axis=1)
print(y_pre_cl)

# 5) How many samples are correctly classified (accuracy)? (1 point)
accuracy_my = np.mean(y_pre_cl == y_cl)
print('accuracy by Numpy: %f' % accuracy_my)

output

■Input data with 4 samples:[0 0 2 1]

☆TensorFlow ------------------------------ 
one-hot encoding of data labels
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]
softmax(z)
[[  2.65387929e-01   7.21399184e-01   1.32128870e-02]
 [  6.65240956e-01   2.44728471e-01   9.00305732e-02]
 [  3.00609605e-01   3.32224994e-01   3.67165401e-01]
 [  1.36853947e-44   1.00000000e+00   1.01122149e-43]]
cross entropy: 0.684028
The predicted class:
[1 0 2 1]
accuracy: 0.750000

☆My solution ------------------------------ 
one-hot encoding of data labels by Numpy
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]
softmax(z) by Numpy
[[  2.65387929e-01   7.21399184e-01   1.32128870e-02]
 [  6.65240956e-01   2.44728471e-01   9.00305732e-02]
 [  3.00609605e-01   3.32224994e-01   3.67165401e-01]
 [  1.36853947e-44   1.00000000e+00   1.01122149e-43]]
cross entropy by Numpy: 0.684028
The predicted class by Numpy:
[1 0 2 1]
accuracy by Numpy: 0.750000

Commentary

Answers and questions can be downloaded from here.

Question 1.

Write a function that turns any class labels y_cl into one-hot encodings y. (2 points) 0 --> (1, 0, 0) 1 --> (0, 1, 0) 2 --> (0, 0, 1) Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).

def to_onehot(num_classes, y_cl):
    y_one = np.eye(num_classes)[y_cl]
    return y_one

By reading the Numpy arrayy_cl, you can get an array with 1 in the position of the element of y_cl. num_class must be the maximum-minimum of y_cl + 1. (This time it is 0,1,2, so there are three.) This is a one-hot vector.

Problem 2.

Write a function that returns the softmax of the input z along the last axis. (2 points)

def softmax(z):
    e = np.exp(z)
    dist = e / np.sum(e, axis=1, keepdims=True)
    return dist

Implement the softmax function as defined. Especially for large inputs, subtraction using the maximum value does not change (P.69), so this may be used. Also note that if the keepdims option is not True, [1] will be 1. (The former is a 1-dimensional array, the latter is a 0-dimensional array (just a number))

Problem 3.

Compute the categorical cross-entropy between data and model (2 points)

crossentropy_my = np.mean(-np.sum(y_one*np.log(y_my),axis=1))
print('cross entropy by Numpy: %f' % crossentropy_my)

Definition of categorical cross-entropy

E = - \sum_{k}t_{k} \log{y_k}

It should be implemented according to. Where $ t_k $ is a one-hot vector and $ y_k $ is an output.

In addition, tf.reduce_mean and np.mean are the same, and for axis in Numpy, refer to here. This time we will consider the classification problem, so if we use array in array (for example, [[A, B, C], [D, E, F]], then [A, B, C] and [D, Array) corresponding to E, F] needs to be considered separately. In this case, if ʻaxis = 1` is specified, the operation within this range becomes possible.

If you organize the things up to this point --Changed y_cl (input value) to one-hot vector. -> y_one --The arrayz that came out through the layers was passed through the softmax function-> y_my

These are separate and correspond to $ t_k $ and $ y_k $ respectively.

Problem 4.

Which classes are predicted by the model (maximum entry). (1 point)

print('The predicted class by Numpy:')
y_pre_cl = np.argmax(y_my,axis=1)
print(y_pre_cl)

See page 70 for details. In the array (e.g. [A, B, C], [D, E, F]) dealt with above, the degree of accuracy of classification in the class is included as an element. To retrieve it, take the subscript with the largest value. This is possible with np.argmax. Similarly, specify ʻaxis = 1`.

A brief description of the one-hot vector

For classification problems, use softmax for the final activation. This is because by looking at the subscripts in the list of output values, you can see which category the data can be classified into (its ratio or probability). For example, if you consider three suitable category problems

[0.10, 0.05, 0.85]

Suppose you get a list of (This corresponds to one of the appropriately fetched arrays of softmax (z) in question now.) Here, np.argmax ([0.10, 0.05, 0.85]) = [2], so the result As you can see, "The label pointed to by this array seems to have a probability of belonging to category 2 of about 0.85x100 = 85%."

If you take the largest subscript like this, you can see the category. To put it the other way around, if you assign an arbitrary number to the correct label and make it special, you can find the label and the computer will know that it is a category. If you think about it, if you drop 1 on the correct answer label and 0 other than that, you can tell whether it is the correct answer by looking at the subscript where 1 is. This is called a one-hot vector.

Next, consider converting the input value y_cl to a one-hot vector. Since it is sufficient to define an array such that 1 is at the position of each element 0,0,2,1 of y_cl, the one-hot vector at this time is

[ [1, 0, 0], # 0
  [1, 0, 0], # 0
  [0, 0, 1], # 2
  [0, 1, 0] ] # 1

It will be. (Ignore small notations.)

Problem 5.

How many samples are correctly classified (accuracy)? (1 point)

accuracy_my = np.mean(y_pre_cl == y_cl)
print('accuracy by Numpy: %f' % accuracy_my)

As briefly summarized above, the softmax output shows how accurately the label is classified into its position (class). To find out how well the final arrayy_pre_cl matches the input value y_cl, you can average the number of the True values.

in conclusion

Classification problems are also an important task in machine learning, but I think they are difficult to understand. (Why use one-hot vector?) I would be very happy if this article could answer even one of those questions.

Want to solve a simple classification problem?