Deep learning large-scale technology

Deep Learning (Supervised by The Japanese Society for Artificial Intelligence) Book – 2015/11/5 http://www.amazon.co.jp/dp/476490487X/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=187205609&pf_rd_s=lpo-top-stripe&pf_rd_t=201&pf_rd_i=4061529021&pf_rd_m=AN1VRQENFRJN5&pf_rd_r=1ZJ9T7KCKHC8QV0QKN9H I summarized Chapter 4 of.

DistBelief Google is the source of Tensorflow developed. A good implementation of distributed parallel technology.

Until now, MapReduce required high-quality communication. The solution to that.

There are two types of parallelization -Model (task) parallel: Process is divided for each machine. Assembly line work. -Data parallel: Divide the data to be included in the processing flow. Distbelief uses these together.

With Distbelief, users only have to focus on "how to calculate a node" and "information to send to the next node", and the computer will decide where to divide the model and where to divide the data.

Gradient calculation and parameter update are performed in parallel with the model, and actual data calculation is performed in parallel with the data. Downpour SGD (stochastic gradient descent method) is used as the gradient calculation method, but even if the replica model processing a certain data group fails, the rest will work.

Sandblaster L-BFGS is used for batch processing (a method of doing it little by little). Since this is data parallelism, data synchronization is required at the end. Waiting for the slowest computer is too annoying, so make the batch smaller and give it to the nodes one by one. Tasks are distributed in order from the end.

GPU usage

In language processing, only a small part of a long vector is non-zero, and the rest is zero (sparse matrix). Image processing, on the other hand, has a dense vector. There are few branches in image processing, and the same processing can be done endlessly. This is a field that GPUs are good at. That's why many people think about leaving it to the GPU.

Also, the transfer speed between GPU / CPU / memory may be a slight bottleneck.

InfiniBand Insanely fast cable. Boasting performance of 56Gbps ~, it seems to be able to solve the transfer speed problem peculiar to GPU.

Accelerate learning convergence

Batch normalization

Technology to solve the problem of internal covariate shift. The internal covariate shift means that the distribution of x at the time of input changes significantly during learning. Since the weights are enthusiastic about adjusting to this shift, the learning of the layer itself can only proceed after that. This slows me down.

Batch normalization normalizes this shift. At the same time, whitening (normalization + uncorrelated) is performed.

distillation

Ensemble learning that uses the average of multiple models for inference is accurate. But it takes too long.

In the case of neural networks, shallower is faster and deeper is slower. Distillation accelerates the learning of deep neural networks from the learning of shallow neural networks.

When the optimum θ is selected, the gradient of E is 0. Then, the slopes of the error term L and the normalization term R of E match. In other words, if you know how to make a mistake, you can also know the normalization term. A technique that makes good use of this.

Dropout How to control overfitting. A method of ignoring nodes at a certain rate at the learning stage. Every time you change the data to be learned, change the node to ignore. Do not ignore during testing. Ignore rate is generally input: 0.2, middle layer: 0.5.

Types of activation functions

ReLu

max(0,x)

Gradient calculation is fast and does not require any special ingenuity. The error propagates to deep nodes without the gradient disappearing. MaxOut

max wx

A method of choosing the largest of multiple linear functions. It's linear and insanely simple, while being able to express complex states. It seems to be very good.

Recommended Posts

Deep learning large-scale technology
Deep Learning
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
Deep Learning technology go leela on linux
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
[AI] Deep Metric Learning
Python: Deep Learning Tuning
Deep learning / softmax function
Deep Learning from scratch 1-3 chapters
Try deep learning with TensorFlow
Deep Learning Gaiden ~ GPU Programming ~
<Course> Deep Learning: Day2 CNN
Deep learning image recognition 1 theory
Deep running 2 Tuning of deep learning
Deep learning / LSTM scratch code
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Deep Kernel Learning with Pyro
Try Deep Learning with FPGA
Deep learning for compound formation?
Introducing Udacity Deep Learning Nanodegree
Subjects> Deep Learning: Day3 RNN
Introduction to Deep Learning ~ Learning Rules ~
Rabbit Challenge Deep Learning 2Day
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Deep reinforcement learning 2 Implementation of reinforcement learning
Generate Pokemon with Deep Learning
Introduction to Deep Learning ~ Backpropagation ~
Deep Learning Model Lightening Library Distiller
Deep Learning / Deep Learning from Zero 2 Chapter 4 Memo
Try Deep Learning with FPGA-Select Cucumbers
Cat breed identification with deep learning
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Make ASCII art with deep learning
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Implement Deep Learning / VAE (Variational Autoencoder)
Introduction to Deep Learning ~ Function Approximation ~
Try deep learning with TensorFlow Part 2
About Deep Learning (DNN) Project Management
Solve three-dimensional PDEs with deep learning.
Introduction to Deep Learning ~ Coding Preparation ~
Organize machine learning and deep learning platforms
Deep learning learned by implementation 1 (regression)
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Microsoft's Deep Learning Library "CNTK" Tutorial
Check squat forms with deep learning
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Reinforcement Learning 3 Practical Edition: Breakout
Deep learning image recognition 2 model implementation
Categorize news articles with deep learning
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
I tried deep learning using Theano