2. Mean and standard deviation with neural network!

Introduction

This is the second in the series. Is it possible to train a neural network (NN) to output the average of some given numerical data? And what about the standard deviation? let's do it.

The problems that NN can learn are roughly divided into "classification problems" and "regression problems". This time, it corresponds to the "regression problem".

policy

  1. Give the neural network 10 (fixed) random numbers.
  2. Calculate the mean of the 10 numbers and the standard deviation.
  3. Give 10 numbers and their mean and standard deviation to the NN created by keras as training data.
  4. The number of NN exits is 2 (2 for "mean" and "standard deviation").
  5. After training the NN, evaluate the performance of the trained NN using a dataset that was not used for training.

Let's do it. In general, it is difficult to prepare learning data for NN training, but this time it is easy to prepare.

Preparation of training data

For the time being, I decided to prepare 50,000 training datasets. One set consists of 10 numbers. The 10 numbers are 10 random numbers according to the distribution data of mean a and standard deviation b using numpy's random.normal (a, b, 10), but here a and b themselves are also It is generated by random.rand () of numpy.

Calculate "10 numbers" and "the mean and standard deviation of these" and store them in the list first.

001.py


import numpy as np
trainDataSize = 50000 #Number of datasets to create
dataLength = 10 #Number of data per set
d = []#Fill in 10 empty lists each.
average_std = []#The second empty list. Enter two numbers at a time.
for num in range(trainDataSize):
	xx = np.random.normal(np.random.rand(),np.random.rand(),dataLength)
	average_std.append(np.mean(xx))
	average_std.append(np.std(xx))
	d.append(xx)

Once you have a list with all 50000 sets, convert it to ndarray again.

002.py


d = np.array(d) #Make it an ndarray.
average_std = np.array(average_std)#Make it an ndarray.

The reason why I don't use ndarray from the beginning is that it is slow.

002.py


#Bad code. Because it's late.
d = np.array([])#Empty numpy array
for num in range(trainDataSize):
	xx = np.random.normal(np.random.rand(),np.random.rand(),dataLength)
	d = np.append(d,xx) #This process is slow!

The created ndarray is the one in which numerical data is thrown in order. Now change the shape of the matrix.

003.py


d = d.reshape(50000,10)
average_std = average_std.reshape(50000,2)

Divide the 50000 dataset into two, 40000 sets for training and 10000 sets for evaluation. We will not consider hyperparameters this time, so we will divide it into two parts.

004.py


#Training in the first half 40,000. Evaluated in the latter half 10000.
d_training_x = d[:40000,:]
d_training_y = average_std[:40000,:]
d_test_x = d[40000:,:]
d_test_y = average_std[40000:,:]

NN design

The point is

  1. Set the input shape to 10 (required)
  2. Set the number of outputs of the last layer to 2 (required).
  3. Make the last activation function linear. In this case, softmax, sigmoid, and ReLu are not suitable. Because there are cases where the average is below 0.
  4. Set the loss function to mean_squared_error. Cross entropy is not suitable in this case (because it is not a classification problem).
  5. The number of other layers, the number of outputs of each layer and the activation function were determined appropriately.

005.py


import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

model = Sequential()
model.add(Dense(100, activation='tanh', input_shape=(10,)))#There are 10 input slots.
model.add(Dense(100, activation='tanh'))
model.add(Dense(40, activation='sigmoid'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(2, activation='linear')) #There are two output slots.
#Stochastic Gradient Descent Adam
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999)
#Loss function root mean square error
model.compile(loss='mean_squared_error',optimizer=optimizer)
model.summary() #NN summary output

NN training

Finally, I will throw in the training data.

006.py


history = model.fit(d_training_x, d_training_y,
batch_size=256,#The training data is thrown in at once for 256 sets of data.
epochs=20,#How many laps will the training data be repeated?
verbose=1,#Verbosity is redundant, which in turn means "talking." If you set it to 1, the training process will be output one by one.
validation_data=(d_test_x, d_test_y))

Checking the progress of learning

Here, the return value of fit () is stored in the variable history. If you look up the return value history with type (), it looks like an object. Let's check with vars ().

007.py


type(history) # <class 'keras.callbacks.callbacks.History'>
vars(history) 
#A lot of information is output.
#If you take a look at the output information, the fields that the history object has are as follows.
# validation_data (list)、
# model (Reference to NN model)、
# params (dictionary. key is'batch_size'、'epochs'、'steps'、'samples'、'verbose'、'do_validation'、'metrics')
# epoch (list)、
# history (dictionary. key is'val_loss'、'loss')
#
#history key is'val_loss'When'loss'Is.
#loss is the loss on the training data. val_loss is the loss to the data for evaluation. Since the variable name is history here, history.history['val_loss']You can access the progress data of how the learning progressed.

Let's plot how learning progresses.

008.py


import matplotlib.pyplot as plt
plt.plot(history.history['val_loss'], label = "val_loss")
plt.plot(history.history['loss'], label = "loss")
plt.legend() #Show legend
plt.title("Can NN learn to calculate average and standard deviation?")
plt.xlabel("epoch")
plt.ylabel(" Loss")
plt.show()

The graph I wrote with this: Figure_1.png

Evaluation of NN

You can see that the learning has progressed, but how accurately have you been able to "calculate"? Throw the first 200 sets of evaluation data into the NN and plot the output (vertical axis) against the mathematical calculation results (horizontal axis).

009.py


#Give data to the trained NN
inp = d_test_x[:200,:]
out = d_test_y[:200,:]
pred = model.predict(inp, batch_size=1)

#Make a graph:average
plt.scatter(out[:,0], pred[:,0])
plt.legend() #Show legend
plt.title("average")
plt.xlabel("mathematical calculation")
plt.ylabel("NN output")
#Draw a line. If you get on this line, you can predict well.
x = np.arange(-0.5, 2, 0.01)
y = x
plt.plot(x, y)
plt.show()

Figure_2.png You can see that the "calculation" is done with approximately high accuracy. Then what about the standard deviation?

009.py


#Make a graph:standard deviation
plt.scatter(out[:,1], pred[:,1])
plt.legend() #Show legend
plt.title("standard deviation")
plt.xlabel("mathematical calculation")
plt.ylabel("NN output")
x = np.arange(0, 1.5, 0.01)
y = x
plt.plot(x, y)
plt.show()

Figure_3.png

Is it a decent place? The average is better, but the standard deviation isn't enough.

Consideration

Roughly speaking, the calculation performed by the neural network is to obtain the product of the input value * x * multiplied by each weight parameter * w *, and to obtain the output value by using the sum of these products as the input of the activation function. ,is.

As for the average, multiply each input value by 0.1 (in this case, 1/10 = 0.1 because there are 10 values to be input) and add them together to obtain the average, so NN calculates the average of 10 values with high accuracy. It's easy to imagine what you will be able to do.

On the other hand, what about the standard deviation? Calculate the mean, then add each of the input values to the mean multiplied by -1 (that is, take the difference from the mean), square it, add it, and divide by 9. , Should be the standard deviation. The tricky part of this process is squaring.

Internally, NN multiplies the fixed weight parameter and the input value, adds them, and passes them to the activation function. Is it really possible to return the squared value of any input value with almost no error? You should be able to express any curve by increasing the parameters, but I'm not sure what kind of calculation it will be.

Perhaps it would be nice if there was an activation function that squared (extended to the nth power) the input value. I would like to think about this somewhere.

Summary

Now that we have an NN that can output values close to the mean and standard deviation, I would like to conclude Part 2. Series 1st Preparation Series 2nd Mean and Standard Deviation Series 3rd Normal Distribution Series 4th Yen

Recommended Posts

2. Mean and standard deviation with neural network!
Neural network with OpenCV 3 and Python 3
Stock price and statistics (mean, standard deviation)
Neural network with Python (scikit-learn)
Neural network starting with Chainer
4. Circle parameters with neural network!
Simple classification model with neural network
Simple neural network theory and implementation
Calculate and display standard weight with python
Compose with a neural network! Run Magenta
Predict time series data with neural network
Persist the neural network built with PyBrain
Verification of Batch Normalization with multi-layer neural network
[Statistics] First "standard deviation" (to avoid frustration with statistics)
Template network config generation with Python and Jinja2
Parametric Neural Network
[Python] How to handle inf and NaN in numpy mean, standard deviation, maximum / minimum
Calculation of standard deviation and correlation coefficient in Python
Train MNIST data with a neural network in PyTorch
Author estimation using neural network and Doc2Vec (Aozora Bunko)
Maximum likelihood estimation of mean and variance with TensorFlow
Calculate mean, median, mode, variance, standard deviation in Python
Implement Convolutional Neural Network
Implement Neural Network from 1
With and without WSGI
Generalized linear model (GLM) and neural network are the same (1)
Create a web application that recognizes numbers with a neural network
Try to build a deep learning / neural network with scratch
Python sample to learn XOR with genetic algorithm with neural network
Neural network to understand and implement in high school mathematics
[Deep learning] Image classification with convolutional neural network [DW day 4]
Hash with the standard library hashlib and compare login passwords
Generalized linear model (GLM) and neural network are the same (2)
Easily build network infrastructure and EC2 with AWS CDK Python