Purpose of this entry

This time, I would like to introduce *** Deep Learning, which was implemented without using the *** library for studying. The language is Python.

However, there are books that I referred to. In this entry, I will introduce the book and explain the code reimplemented in Python based on the Java code described in the book.

background

I don't think that the fact that deep learning is drawing attention as a catalyst for the recent AI boom is not as much as I will mention here. Considering that a large number of books of this kind have been published recently, it is easy to see how much attention they have received.

However, not everyone in the world specializes in AI, and rather, such people are really limited to some groups. Many people

I'm not working directly on deep learning right now, but considering that AI will be introduced as an infrastructure in the near future, it can't be ignored.

I think you have the feeling. That's why I think many people think that they should study for the time being, and I am one of them.

When I picked up the book, I felt that it tended to be polarized into the following types (it is just a tendency, and it is more than enough that I miss a good book). ..

――A type that makes you feel frustrated with a lot of formulas ――After explaining the outline, the type that ends up with indigestion due to how to use the library

I was unnecessarily reluctant to use the library without knowing the contents at all, so I was wondering if there would be a book that went a little further-(I didn't implement it myself). Neglect orz No, I don't specialize in deep learning for myself ^^;).

Encounter with good books

Then, a book with just the right feeling came out, so I jumped in and read it all at once. This was a big hit.

[ Nest cage, Deep Learning Java programming Theory and implementation of deep learning, Impress, 2016. ](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+ BWGDT & a8ejpredirect = https% 3A% 2F% 2Fwww.amazon.co.jp% 2Fdp% 2F4844381288% 2F% 3Ftag% 3Da8-affi-271202-22)

The concept of implementing with a simple but super-simple model fits my idea that I think this is the most useful for understanding! Prove analytically and finish high! Rather, I think it is worth noting that the explanation is such that the differences in characteristics between methods can be grasped sensuously.

After reading this, I wanted to know the part where there is a leap in mathematical expansion and a little deeper part. So, when I read the following standard books, I got a refreshing head.

-[Okaya, Deep Learning, Kodansha, 2015. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4061529021%2F%3Ftag%3Da8- affi-271202-22)

[ Okaya, Deep Learning, Kodansha, 2015. ](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+ BWGDT & a8ejpredirect = https% 3A% 2F% 2Fwww.amazon.co.jp% 2Fdp% 2F4061529021% 2F% 3Ftag% 3Da8-affi-271202-22)

-Aso et al., Deep Learning, Modern Science, Supervised by The Japanese Society for Artificial Intelligence, Kamishima ed., 2015.

[ Aso et al., Deep Learning, Modern Science, Supervised by the Japanese Society for Artificial Intelligence, Kamishima ed., 2015. ](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+ BWGDT & a8ejpredirect = https% 3A% 2F% 2Fwww.amazon.co.jp% 2Fdp% 2F476490487X% 2F% 3Ftag% 3Da8-affi-271202-22)

So, [Nest cage, Deep Learning Java programming Theory and implementation of deep learning, Impress, 2016. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4844381288%2F%3Ftag%3Da8- affi-271202-22) is not enough just to use the library, but I think that it is a book suitable for beginners to read for those who have a high hurdle to suddenly enter a specialized book.

If you specialize in understanding Convolutional Neural Networks (CNN), the following books are by far the best. Although specialized in CNN, it definitely contributes to the understanding of this implementation.

-[Saito, Deep Learning from scratch-Theory and implementation of deep learning learned from Python, O'Reilly Japan, 2016. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4873117585%2F%3Ftag%3Da8- affi-271202-22)

[ Saito, Deep Learning from scratch-Theory and implementation of deep learning learned in Python, O'Reilly Japan, 2016. ](https://px.a8.net/svt/ejp?a8mat=2NZCQW + 6MQUCY + 249K + BWGDT & a8ejpredirect = https% 3A% 2F% 2Fwww.amazon.co.jp% 2Fdp% 2F4873117585% 2F% 3Ftag% 3Da8-affi-271202-22)

[Deep Learning Java Programming Theory and Implementation of Deep Learning](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp % 2Fdp% 2F4844381288% 2F% 3Ftag% 3Da8-affi-271202-22) also deals with CNN, so of course you can understand it considerably here as well. I think this book is excellent in that it compares typical methods from a bird's-eye view in a short time.

Deep Learning from scratch explained exactly from the basics of neural networks, and in the meantime, it was connected to CNN! I have the impression that it is structured like a story.

Migrating to Python

[Deep Learning Java Programming Theory and Implementation of Deep Learning](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp I was able to run the Java sample code described in% 2Fdp% 2F4844381288% 2F% 3Ftag% 3Da8-affi-271202-22), but I thought it would be boring just to run it. So this time, I decided to move to Python myself.

Targets are *** Deep Belief Nets (DBN) *** and *** Stacked Denoising Autoencoders (SDA) ***,

The great thing is the author, Mr. Nest, and I'm just riding for free. However, I thought that it would be a rather crazy act to just move without thinking, so I made the following rules.

-** No copying ***

I thought it was meaningless to copy and paste the Java code and rewrite it to match Python.
However, I did not want to copy and modify the Python code I implemented. -** Change variable name ***
In principle, using the same variable name as Java code is prohibited. If you use the same variable name, it may be just a sutra copy, so I dared to add a modifier.
It also has the advantage that you can understand the meaning of variables at a glance. ―― *** Do not use library ***
I didn't even use numpy, let alone standard libraries such as chainer and TensorFlow.
If you decompose and expand the matrix to the element level and consider what each term means, you may find various things.
There is also the point that it is easy to simply migrate. -*** Insert the formula number in the book into the calculation section *** (Constraint Boltzmann machine section, autoencoder section only)
The purpose is to check which formula you are implementing now.
It is also an advantage that the correspondence is easy to understand when looking back later.

I just said that it was a slightly strict sutra copy lol.

Introducing Python code

Code location

The code is published below.

--GitHub repository

DeepLearningWithPython

Execution method

The execution method for each algorithm is as follows.

Deep Belief Nets

cd <cloned path>/DeepLearningWithPython/DeepNeuralNetworks
python DeepBeliefNets.py

Stacked Denoising Autoencoders

cd <cloned path>/DeepLearningWithPython/DeepNeuralNetworks
python StackedDenoisingAutoencoders.py

For software configuration, [Deep Learning Java Programming Theory and Implementation of Deep Learning](https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww. It follows amazon.co.jp% 2Fdp% 2F4844381288% 2F% 3Ftag% 3Da8-affi-271202-22), so please refer to that.

result

The following three types of results are output.

-------------------------------
DBN(or SDA) Regression model evaluation
-------------------------------
Accuracy:  100.0 %
Precision:
class 1: 100.0 %
class 2: 100.0 %
class 3: 100.0 %
Recall:
class 1: 100.0 %
class 2: 100.0 %
class 3: 100.0 %

The meaning of each result is as follows.

--Accuracy (correct answer rate): Correct answer rate in all data --Precision: Correct answer rate in the predicted positive data --Recall: Percentage of data whose correct answer is positive that can be predicted as positive

It is expressed by a mathematical formula as follows.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \\
Precision = \frac{TP}{TP + FP} \\
Recall = \frac{TP}{TP + FN}

The breakdown of TP, TN, FP and FN is shown in the table below.

	Positive and predicted	Negative and predicted
Correct is correct	True Positive (TP)	False Negative (FN)
Negative is correct	False Positive (FP)	True Negative (TN)

--Reference: Nest cage, 2.5 Neural network theory and algorithm: Deep Learning Java programming Deep learning theory and implementation, p.47, Impress, 2016.

Outline of processing

Network model

The network model used is the same for both DBN / SDA.

--Input layer neurons: 60

3 Class test data is used. Add noise and give various input data.
Please see the figure below for the data structure of each class. --Hidden layer 1 neuron: 20 --Hidden layer 2 neurons: 20
The activation function is the Sigmoid function. --Output layer neurons: 3
This is a layer for class identification. It uses logistic regression.

Learning flow

For both DBN / SDA, you will learn as follows.

--Pre-training

Learn for each layer.
Use Restricted Boltzmann Machines (RBM) for DBN and Denoising Autoencoders (DA) for SDA.
The output layer is not learned.

--Fine tuning

Pre-training parameters are used as initial values, and learning (error back propagation method) is performed using an ordinary multi-layer neural network.
Use a different teacher dataset than during pre-training. (To avoid overfitting)

Rough difference between DBN / SDA

Both methods use exactly the same model, and the test data created in the same way can be properly learned. If you get almost the same results with different methods, what's the difference? I am concerned about that.

Common point

It can be said that the common point of both methods is to obtain parameters that match the input / output data of the two-layer network. In the case of DBN, due to the characteristics of the Boltzmann machine network, *** input data *** and *** state *** are compared, but if you close your eyes to the details, they are essentially the same. I'm doing that.

In addition, by using data with various noises added at the learning stage, it is common to make it more resistant to noise when identifying it in production. It can be interpreted that there is a difference in how the noise is given.

Difference

In the case of DBN, it is stochastically determined whether neurons are activated. Even though they are in the same state, they may or may not be activated. This characteristic corresponds to automatically adding noise to the parameters being trained. The algorithm adds noise without permission.

On the other hand, in the case of SDA, some noise is added to the training data and then input to the learning device. This is because the algorithm does not have the property of adding noise to itself because it proceeds deterministically.

From these points of view, it can be understood that there are the following differences in features.

Since the algorithm adds noise without permission, the user does not need to process the input data and can use it relatively easily.
It is a little difficult to implement because the probability must be calculated each time the activation function is output.

Learning itself is an ordinary neural network framework, so implementation is easy.
The work of adding noise to the learning data is required on the user side. ――However, if you want to give some tendency to the noise you want to give, you can say that there is a degree of freedom.

It seems that there are still various views on the question of how these characteristics affect learning and why learning of deep networks works well when initialization is performed in this way. My head hasn't been organized yet, so I'll refrain from saying it here lol ^^;

in conclusion

Even if there was a sample program that could be used as a reference, by moving my own hands and implementing it, I gained a little understanding of Deep Learning.

If the information posted here is of any use to anyone, I would be more than happy.

I haven't done dropouts or CNN implementation yet, but I'll implement it again when I feel like it ^^

References

-[Nest cage, Deep Learning Java programming Theory and implementation of deep learning, Impress, 2016. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4844381288%2F%3Ftag%3Da8- affi-271202-22) -[Okaya, Deep Learning, Kodansha, 2015. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4061529021%2F%3Ftag%3Da8- affi-271202-22) -[Aso et al., Deep Learning, Modern Science, Supervised by The Japanese Society for Artificial Intelligence, Kamishima ed., 2015. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F476490487X%2F%3Ftag%3Da8- affi-271202-22) -[Saito, Deep Learning from scratch-Theory and implementation of deep learning learned from Python, O'Reilly Japan, 2016. ](Https://px.a8.net/svt/ejp?a8mat=2NZCQW+6MQUCY+249K+BWGDT&a8ejpredirect=https%3A%2F%2Fwww.amazon.co.jp%2Fdp%2F4873117585%2F%3Ftag%3Da8- affi-271202-22)

[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.