Want to solve a simple classification problem?

Introduction

In "Deep Learning from scratch", the stance of implementing the contents of Tensorflow with Numpy is taken. In this article, we'll follow suit and consider the problem of implementing a simple classification problem in Numpy.

problem

Suppose we want to classify some data (4 samples) into 3 distinct classes: 0, 1, and 2. We have set up a network with a pre-activation output z in the last layer. Applying softmax will give the final model output. input X ---> some network --> z --> y_model = softmax(z)

We quantify the agreement between truth (y) and model using categorical cross-entropy. J = - sum_i (y_i * log(y_model(x_i))

In the following you are to implement softmax and categorical cross-entropy and evaluate them values given the values for z.

We will use y_cl = np.array ([0, 0, 2, 1]) as the input value, and will not consider the hidden layer in detail this time. That is, suppose that arrayy_cl becomes another arrayz when this input value goes out of the hidden layer after some calculation. When looking at the output value through the softmax function using z, does it give the same array as y_cl? Please also discuss its accuracy.

Since the implementation of Tensorflow has already been done as follows, please implement the program that behaves the same only with Numpy. (I think that section 3.5 of the reference book will be helpful.) There are problems from 1) to 5).


from __future__ import print_function
import numpy as np
import tensorflow as tf


# Data: 4 samples with the following class labels (input features X irrelevant here)
y_cl = np.array([0, 0, 2, 1])

# output of the last network layer before applying softmax
z = np.array([
    [  4,   5,   1],
    [ -1,  -2,  -3],
    [0.1, 0.2, 0.3],
    [ -1, 100,   1]
    ])



# TensorFlow implementation as reference. Make sure you get the same results!
print('\nTensorFlow ------------------------------ ')
with tf.Session() as sess:
    z_ = tf.constant(z, dtype='float64')
    y_ = tf.placeholder(dtype='float64', shape=(None,3))

    y = np.array([[1., 0., 0.], [1., 0., 0.], [0., 0., 1.], [0., 1., 0.]])
    print('one-hot encoding of data labels')
    print(y)

    y_model = tf.nn.softmax(z)
    crossentropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_model), reduction_indices=[1]))

    print('softmax(z)')
    print(sess.run(y_model))

    print('cross entropy = %f' % sess.run(crossentropy, feed_dict={y_: y}))


print('\nMy solution ------------------------------ ')
# 1) Write a function that turns any class labels y_cl into one-hot encodings y. (2 points)
#    0 --> (1, 0, 0)
#    1 --> (0, 1, 0)
#    2 --> (0, 0, 1)
#    Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).

def to_onehot(y_cl, num_classes):
    y = np.zeros((len(y_cl), num_classes))
    return y


# 2) Write a function that returns the softmax of the input z along the last axis. (2 points)
def softmax(z):
    return None


# 3) Compute the categorical cross-entropy between data and model (2 points)


# 4) Which classes are predicted by the model (maximum entry). (1 point)


# 5) How many samples are correctly classified (accuracy)? (1 point)

answer

from __future__ import print_function
import numpy as np
import tensorflow as tf


# Data: 4 samples with the following class labels (input features X irrelevant here)
y_cl = np.array([0, 0, 2, 1])

# output of the last network layer before applying softmax
z = np.array([
    [  4,   5,   1],
    [ -1,  -2,  -3],
    [0.1, 0.2, 0.3],
    [ -1, 100,   1]
    ])

#The Tensorflow part is omitted

print('\n☆My solution ------------------------------ ')
# 1) Write a function that turns any class labels y_cl into one-hot encodings y. (2 points)
#    0 --> (1, 0, 0)
#    1 --> (0, 1, 0)
#    2 --> (0, 0, 1)
#    Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).

def to_onehot(num_classes, y_cl):
    y_one = np.eye(num_classes)[y_cl]
    return y_one

print('one-hot encoding of data labels by Numpy')
y_one = (to_onehot(3,y_cl)).astype(np.float32)
print(y_one)

#2) Write a function that returns the softmax of the input z along the last axis. (2 points)
def softmax(z):
    e = np.exp(z)
    dist = e / np.sum(e, axis=1, keepdims=True)
    return dist

print('softmax(z) by Numpy')
y_my = softmax(z)
print(y_my)

# 3) Compute the categorical cross-entropy between data and model (2 points)
crossentropy_my = np.mean(-np.sum(y_one*np.log(y_my),axis=1))
print('cross entropy by Numpy: %f' % crossentropy_my)

# 4) Which classes are predicted by the model (maximum entry). (1 point)
print('The predicted class by Numpy:')
y_pre_cl= np.argmax(y_my,axis=1)
print(y_pre_cl)

# 5) How many samples are correctly classified (accuracy)? (1 point)
accuracy_my = np.mean(y_pre_cl == y_cl)
print('accuracy by Numpy: %f' % accuracy_my)

output

■Input data with 4 samples:[0 0 2 1]

☆TensorFlow ------------------------------ 
one-hot encoding of data labels
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]
softmax(z)
[[  2.65387929e-01   7.21399184e-01   1.32128870e-02]
 [  6.65240956e-01   2.44728471e-01   9.00305732e-02]
 [  3.00609605e-01   3.32224994e-01   3.67165401e-01]
 [  1.36853947e-44   1.00000000e+00   1.01122149e-43]]
cross entropy: 0.684028
The predicted class:
[1 0 2 1]
accuracy: 0.750000

☆My solution ------------------------------ 
one-hot encoding of data labels by Numpy
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]
softmax(z) by Numpy
[[  2.65387929e-01   7.21399184e-01   1.32128870e-02]
 [  6.65240956e-01   2.44728471e-01   9.00305732e-02]
 [  3.00609605e-01   3.32224994e-01   3.67165401e-01]
 [  1.36853947e-44   1.00000000e+00   1.01122149e-43]]
cross entropy by Numpy: 0.684028
The predicted class by Numpy:
[1 0 2 1]
accuracy by Numpy: 0.750000

Commentary

Answers and questions can be downloaded from here.

Question 1.

  1. Write a function that turns any class labels y_cl into one-hot encodings y. (2 points) 0 --> (1, 0, 0) 1 --> (0, 1, 0) 2 --> (0, 0, 1) Make sure that np.shape(y) = (4, 3) for np.shape(y_cl) = (4).
def to_onehot(num_classes, y_cl):
    y_one = np.eye(num_classes)[y_cl]
    return y_one

By reading the Numpy arrayy_cl, you can get an array with 1 in the position of the element of y_cl. num_class must be the maximum-minimum of y_cl + 1. (This time it is 0,1,2, so there are three.) This is a one-hot vector.

Problem 2.

  1. Write a function that returns the softmax of the input z along the last axis. (2 points)
def softmax(z):
    e = np.exp(z)
    dist = e / np.sum(e, axis=1, keepdims=True)
    return dist

Implement the softmax function as defined. Especially for large inputs, subtraction using the maximum value does not change (P.69), so this may be used. Also note that if the keepdims option is not True, [1] will be 1. (The former is a 1-dimensional array, the latter is a 0-dimensional array (just a number))

Problem 3.

  1. Compute the categorical cross-entropy between data and model (2 points)
crossentropy_my = np.mean(-np.sum(y_one*np.log(y_my),axis=1))
print('cross entropy by Numpy: %f' % crossentropy_my)

Definition of categorical cross-entropy

E = - \sum_{k}t_{k} \log{y_k}

It should be implemented according to. Where $ t_k $ is a one-hot vector and $ y_k $ is an output.

In addition, tf.reduce_mean and np.mean are the same, and for axis in Numpy, refer to here. This time we will consider the classification problem, so if we use array in array (for example, [[A, B, C], [D, E, F]], then [A, B, C] and [D, Array) corresponding to E, F] needs to be considered separately. In this case, if ʻaxis = 1` is specified, the operation within this range becomes possible.

If you organize the things up to this point --Changed y_cl (input value) to one-hot vector. -> y_one --The arrayz that came out through the layers was passed through the softmax function-> y_my

These are separate and correspond to $ t_k $ and $ y_k $ respectively.

Problem 4.

  1. Which classes are predicted by the model (maximum entry). (1 point)
print('The predicted class by Numpy:')
y_pre_cl = np.argmax(y_my,axis=1)
print(y_pre_cl)

See page 70 for details. In the array (e.g. [A, B, C], [D, E, F]) dealt with above, the degree of accuracy of classification in the class is included as an element. To retrieve it, take the subscript with the largest value. This is possible with np.argmax. Similarly, specify ʻaxis = 1`.

A brief description of the one-hot vector

For classification problems, use softmax for the final activation. This is because by looking at the subscripts in the list of output values, you can see which category the data can be classified into (its ratio or probability). For example, if you consider three suitable category problems

[0.10, 0.05, 0.85]

Suppose you get a list of (This corresponds to one of the appropriately fetched arrays of softmax (z) in question now.) Here, np.argmax ([0.10, 0.05, 0.85]) = [2], so the result As you can see, "The label pointed to by this array seems to have a probability of belonging to category 2 of about 0.85x100 = 85%."

If you take the largest subscript like this, you can see the category. To put it the other way around, if you assign an arbitrary number to the correct label and make it special, you can find the label and the computer will know that it is a category. If you think about it, if you drop 1 on the correct answer label and 0 other than that, you can tell whether it is the correct answer by looking at the subscript where 1 is. This is called a one-hot vector.

Next, consider converting the input value y_cl to a one-hot vector. Since it is sufficient to define an array such that 1 is at the position of each element 0,0,2,1 of y_cl, the one-hot vector at this time is

[ [1, 0, 0], # 0
  [1, 0, 0], # 0
  [0, 0, 1], # 2
  [0, 1, 0] ] # 1

It will be. (Ignore small notations.)

Problem 5.

  1. How many samples are correctly classified (accuracy)? (1 point)
accuracy_my = np.mean(y_pre_cl == y_cl)
print('accuracy by Numpy: %f' % accuracy_my)

As briefly summarized above, the softmax output shows how accurately the label is classified into its position (class). To find out how well the final arrayy_pre_cl matches the input value y_cl, you can average the number of the True values.

in conclusion

Classification problems are also an important task in machine learning, but I think they are difficult to understand. (Why use one-hot vector?) I would be very happy if this article could answer even one of those questions.

Recommended Posts

Want to solve a simple classification problem?
I want to solve Sudoku (Sudoku)
[Keras] I tried to solve a donut-type region classification problem by machine learning [Study]
I tried to solve a combination optimization problem with Qiskit
I wanted to solve the ABC164 A ~ D problem with Python
Stack problem: Try to solve "20. Valid Parentheses"
I want to print in a comprehension
How to solve the bin packing problem
I want to build a Python environment
A simple IDAPython script to name a function
[AtCoder] Solve ABC1 ~ 100 A problem with Python
Solve a simple traveling salesman problem using a Boltzmann machine with simulated annealing
Try to solve the traveling salesman problem with a genetic algorithm (Theory)
Try to solve a set problem of high school math with Python
I want to create a machine learning service without programming! Text classification
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
I want to make matplotlib a dark theme
Try to solve the fizzbuzz problem with Keras
Try to solve the traveling salesman problem with a genetic algorithm (Python code)
[AtCoder] Solve A problem of ABC101 ~ 169 with Python
A simple example of how to use ArgumentParser
I want to easily create a Noise Model
Try to calculate a statistical problem in Python
Try to solve the traveling salesman problem with a genetic algorithm (execution result)
I want to INSERT a DataFrame into MSSQL
Try to solve the Python class inheritance problem
I want to create a window in Python
I want to make a game with Python
If you want to create a Word Cloud.
I don't want to take a coding test
Sample to draw a simple clock using ebiten
I want to create a plug-in type implementation
I want to solve APG4b with Python (Chapter 2)
I want to easily find a delicious restaurant
I want to write to a file with Python
I want to upload a Django app to heroku
I tried to solve the virtual machine placement optimization problem (simple version) with blueqat
[Mac] I want to make a simple HTTP server that runs CGI with Python
[Pyhton] I want to solve the problem that tkinter does not work on MacOS11
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I want to iterate a Python generator many times
I want DQN Puniki to hit a home run
100 image processing knocks !! (021-030) I want to take a break ...
I want to give a group_id to a pandas data frame
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to transition with a button in flask
I want to climb a mountain with reinforcement learning
Try to solve the internship assignment problem with Python
I want to write in Python! (2) Let's write a test
I want to find a popular package on PyPi
I want to randomly sample a file in Python
Brown coder tried to solve Panasonic Contest 2020A ~ C
A very simple example of an ortoolpy optimization problem
I want to easily build a model-based development environment
I want to work with a robot in python.
When you want to play a game via Proxy
I want to split a character string with hiragana
I want to install a package of Php Redis
[Simple procedure] To log in to ssh without a password
[Python] I want to make a nested list a tuple