Deep learning learned by implementation 1 (regression)

First deep learning

Introduction

I first came into contact with deep learning when I was in my third year of undergraduate school, and it seems that there were few lectures on deep learning in my undergraduate classes. Many of the deep learning studies were self-taught, but I picked up the steps I took to understand and implement the internal structure of deep learning and posted it on my blog. Put it on. Also, this is my first time writing a blog and it is a practice article, so it seems difficult to read. Please pardon.

What is deep learning?

First, I will explain what deep learning is doing before implementation. Deep learning is, in a nutshell, function optimization. If you optimize a function that takes an image as input and gives the probability that the image is a cat, it becomes a classifier that classifies the image into cats and others, and a function that takes $ x $ as input and gives $ sin (x) $. If you optimize, it is called a regression model. First, we will implement regression to realize that deep learning is function optimization.

Environmental arrangement

If you have a personal computer and an internet environment, you have everything you need for deep learning. If you use google colaboratory provided by google, you have all the necessary libraries, so use this (it has become a convenient world). google cola boratory can be selected from others by pressing a new button from google drive. When opened, the jupiter-notebook is opened and an interactive environment is configured immediately.

Regression implementation (predicting y = sin (x))

I think it is easier to understand if you discuss it while seeing what it implements and works, so first implement the regression model. First, let's insert the following code.

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0,20,0.1)
y = np.sin(x)
plt.plot(x,y)

When I do this, I think the sin curve is displayed as a graph. This is the library that matplotlib used to draw tables, and I will use this library frequently from now on. Up to this point, training data has been generated. Real numbers from 0 to 20 in increments of 0.1 and their sins.

Next, define the model.


from keras import layers
from keras import models
from keras import optimizers
model = models.Sequential()
model.add(layers.Dense(256,activation = "relu",input_shape=(1,)))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(256,activation = "relu"))
model.add(layers.Dense(1))
model.compile(loss = "mse",optimizer="adam")

Here I imported what I needed from the keras library. The model is how to create a function this time, and although there are various ways to describe it, Sequential () is used in an easy-to-understand manner. You can see here that we are adding a lot of layers to the model. This time I'm using a lot of layers.Dense, but I'll explain what this is.

Fully connected layer

As shown in the subheading, layers.Dense receives a vector in the fully connected layer and returns the vector. I think there are many things that take 256 as an argument, but this is how many dimensions the output should be. For example, the input is n-dimensional x, the output is m-dimensional y, and the fully connected layer is expressed by the following equation.

y = Ax+b

Where A is an n * m matrix and b is an m-dimensional vector. Where does this matrix and vector come from? We have them all as variables and optimize them later. In other words, this makes it possible to reproduce any linear transformation.

Activation function

It turns out that the model above tries to get $ y $ by applying linear transformations many times, but multiple linear transformations can be reproduced with one linear transformation. In this case, it is meaningless to stack many fully connected layers. What comes out there is an activation function, which is a non-linear map. It does a non-linear mapping to all the values of the vector. By biting this between the fully connected layers, the expressive ability of the entire model increases when the layers are stacked. All of this code uses Relu. This is $ max (0, x) $, which is certainly non-linear. This Relu is often used in the field of deep learning.

optimisation

A large number of internal variables ($ A and b $) are used in the fully connected layer to represent an arbitrary linear transformation, and how to optimize this is described in model.compile. Loss is an index whose value becomes smaller as it becomes optimal, and this time we use mse (mean squared error). That is, the square of the difference between the predicted value and the correct value. Parameters cannot be optimized simply by calculating the loss of the current model. We have to calculate how much to move the parameter in which direction to reduce the loss. Basically, it is sufficient to descend the gradient of the parameters, but from the viewpoint of stabilization and speeding up, adam that uses the second derivative or the gradient in the previous step instead of a simple gradient is the optimum for deep learning. It is said that it is good for conversion.

Training

Let's actually train.

hist=model.fit(x,y,steps_per_epoch=10,epochs = 10)

In learning, the training data is often huge, so it is often not possible to go down the gradient using all the training data in one learning (parameter adjustment), so a batch with smaller training data is used. This time, the entire training data is divided into 10 (setps_per_epoch = 10). In other words, 2 going down the gradient from this data 10 times is one epoch (corresponding to the number of times the entire training data was licked in the learning unit), and this time training to lick the entire training data 10 times I let you.

Forecast

Let's predict.

test_x = x + 0.05
acc_y = np.sin(test_x)
pre_y = model.predict(test_x)
plt.plot(test_x,acc_y)
plt.plot(test_x,pre_y)
plt.show()

This will give you an idea of how much the model behaves in the same way as sin (x) for data that deviates from the training data by 0.05 in the x direction. If you move it as it is, it will look like this. 10_10.png

The value is far off at $ x> 10 $.

By improving the model itself and doing more training, it can be improved as shown below, so let's try it.

30_30.png

GPU If you need a model that takes a long time to learn, or if you need a long training period, go to Edit-> Notebook Settings in google colaboratory and change the hardware accelerator from None to GPU. Training should end early with the power of the GPU.

in conclusion

This time, the sin curve was expressed only by linear transformation and relu. Not only sin curves but also deep learning can approximate arbitrary functions from image recognition to image generation, so the possibilities are endless. Next time, I would like to explain about the convolution layer by implementing image recognition and solid but mnist handwriting recognition in as short a line as possible.

Recommended Posts

Deep learning learned by implementation 1 (regression)
Deep learning learned by implementation 2 (image classification)
Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Deep reinforcement learning 2 Implementation of reinforcement learning
Deep learning image recognition 2 model implementation
Deep Learning
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Produce beautiful sea slugs by deep learning
Deep Understanding Object Detection by Deep Learning by Keras
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
Start Deep learning
Python Deep Learning
Deep learning × Python
Deep Learning from scratch-Chapter 4 tips on deep learning theory and implementation learned in Python
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
99.78% accuracy with deep learning by recognizing handwritten hiragana
Parallel learning of deep learning by Keras and Kubernetes
Implementation of Deep Learning model for image recognition
First deep learning in C #-Imitating implementation in Python-
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
First Deep Learning ~ Struggle ~
Stock investment by deep reinforcement learning (policy gradient method) (1)
Python: Deep Learning Practices
Deep learning / activation functions
Abnormal value detection by unsupervised learning: Mahalanobis distance (implementation)
Machine learning logistic regression
[Anomaly detection] Detect image distortion by deep distance learning
Deep learning 1 Practice of deep learning
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
First Deep Learning ~ Solution ~
[AI] Deep Metric Learning
Machine learning linear regression
I tried deep learning
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Classify anime faces by sequel / deep learning with Keras
Python: Deep Learning Tuning
[For beginners of deep learning] Implementation of simple binary classification by full coupling using Keras
Deep learning large-scale technology
Supervised learning (regression) 1 Basics
Python: Supervised Learning (Regression)
Deep learning / softmax function
Rank learning using neural network (Implementation of RankNet by Chainer)
"Learn while making! Development deep learning by PyTorch" on Colaboratory.
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Basic understanding of depth estimation by mono camera (Deep Learning)
Create AI to identify Zuckerberg's face by deep learning ③ (Data learning)
Automatic composition by deep learning (Stacked LSTM edition) [DW Day 6]
Deep Python learned from DEAP
Machine learning learned with Pokemon
Deep Learning from scratch 1-3 chapters
Try deep learning with TensorFlow
Deep Learning Gaiden ~ GPU Programming ~
<Course> Deep Learning: Day2 CNN
Deep learning image recognition 1 theory
[Reinforcement learning] Tracking by multi-agent
Deep learning / LSTM scratch code
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Deep Kernel Learning with Pyro