Introduction

This is the content of Course 4, Week 2 (C4W2) of Deep Learning Specialization.

(C4W2L01) Why look at case studies?

--This week's outline - Classic Networks - LeNet-5 - AlexNet - VGG - ResNet (152 layers) - Inception

(C4W2L02) Classic Networks

LeNet-5 (1998)

Input ; 32 x 32 x 1
CONV (5x5, s=1) ; 28 x 28 x 1
Avg POOL (f=2, s=2) ; 14 x 14 x 6
CONV (5x5, s=1) ; 10 x 10 x 16
Avg POOL (f=2, s=2) ; 5 x 5 x 6
FC; 120 parameters
FC; 84 parameters
\hat{y}

--Number of parameters; 60k -$ n_H $, $ n_W $ become smaller and $ n_C $ becomes larger --Typical networks such as CONV, POOL, CONV, POOL, FC, FC

AlexNet (2012)

Input ; 227x227x3
CONV (11x11, s=4) ; 55 x 55 x 96
Max POOL (3x3, s=2) ; 27 x 27 x 96
CONV (5x5, same) ; 27 x 27 x 256
Max POOL (3x3, s=2) ; 13 x 13 x 256
CONV (3x3, same) ; 13 x 13 x 384
CONV (3x3, same) ; 13 x 13 x 384
CONV (3x3, same) ; 13 x 13 x 256
Max POOL (3x3, s=2) ; 6 x 6 x 256
FC; 4096 parameters
FC; 4096 parameters
softmax; 1000 parameters

Similarity to LeNet, but much bigger (\sim 60M parameters)
ReLU
Multiple GPUs
Local Response Normalization

VGG-16

CONV = 3x3 filter, s=1, same
Max POOL = 2x2, s=2

Input ; 224 x 224 x 3
CONV64 x 2 ; 224 x 224 x 64
POOL ; 112 x 112 x 64
CONV128 x 2 ; 112 x 112 x 128
POOL ; 56 x 56 x 128
CONV256 x 3; 56 x 56 x 256
POOL ; 28 x 28 x 256
CONV512 x 3; 28 x 28 x 512
POOL ; 14 x 14 x 512
CONV512 x 3 ; 14 x 14 x 512
POOL ; 7 x 7 x 512
FC; 4096 parameters
FC; 4096 parameters
Softmax; 1000 parameters

--Number of parameters; $ \ sim $ 138M --Relative structural uniformity

Impressions

――The speed of the times when 2015 is expressed as classic

(C4W2L03) Residual Networks (ResNet)

Residual block
- a^{[l]}
- z^{[l+1]} = W^{[l+1]}a^{[l]} + b^{[l+1]}
- a^{[l+1]} = g(z^{[l+1]})
- z^{[l+2]} = W^{[l+2]}a^{[l+1]} + b^{[l+2]}
- a^{[l+1]} = g(z^{[l+1]} + a^{[l]})

--The deeper the layer, the larger the training error in a normal network. ――But with ResNet, training errors decrease even if it exceeds 100 layers. Effective for learning deep networks

(C4W2L04) Why ResNets works

--If $ W ^ {[l + 2]} = 0 $, $ b ^ {[l + 2]} = 0 $, then $ a ^ {[l + 2]} = g (a ^ {[l ]}) = a ^ {[l]} $ --Identity function is easy for Residual block to learn (Residual block makes it easy to learn an identity function)

(C4W2L05) Network in network and 1x1 convolutions

Why does a 1x1 convolution do?
- (6 \times 6 \times 32) \ast (1 \times 1 \times 32) = (6 \times 6 \times \textrm{#filters}) --Image of applying fully connected (FC) to each pixel of input --Also known as Network in Network
Using 1x1 convolutions
- Input ; 28 \times 28 \times 192
- ReLU, CONV 1x1, 32 filters → Output ; 28 \times 28 \times 32 -$ n_C $ can be reduced

(C4W2L06) Inception network motivation

--Apply the following to Input (28x28x192) and combine them. - 1x1 → Output ; 28x28x64 - 3x3 → Output ; 28x28x128 - 5x5 → Output ; 28x28x32 - Max POOL → Output ; 28x28x32 --Total 28x28x256 --Apply all the different filter sizes and pooling and let the network choose the right one

The problem of computational cost
- Input ; 28x28x192
- CONV 5x5, same, 32 → Output ; 28x28x32 --Calculation cost; 28x28x32x5x5x192 = 120M
Using 1x1 convolution
- Input ; 28x28x192
- CONV 1x1, 16 → Output ; 28x28x16 (bottle neck layer)
- CONV 5x5, 32 → Output ; 28x28x32 --Calculation cost; 28x28x16x192 + 28x28x32x5x5x16 = 12.4M (1/10 above) --If the bottle neck layer is designed properly, the amount of calculation can be reduced without affecting the performance.

(C4W2L07) Inception network

--Description of Inception network with bottle neck layer --Called GoogLeNet

(C4W2L08) Using open-source implementation

--Description of downloading the source code from GitHub (`` `git clone```)

(C4W2L09) Transfer Learning

-$ x $ → layer → layer → $ \ cdots $ → layer → softmax → $ \ hat {y} $ --When there is little data, train only softmax (other parameters are fixed) --For large datasets, for example, the latter half of the layer is trained and the first half of the layer is fixed. --Learning the entire network when there is very large data

(C4W2L10) Data augumentation

Common augmentation method
- mirroring
- random cropping --The following are rarely used
  - rotation
  - shearing
  - local warping
color shifting --Add (or subtract) numbers to R, G, B --AlexNet's paper describes the method of PCA color augmentation.
Implementing distortions during training --After loading one image, distort the image and train them as mini-batch.

(C4W2L11) The state of computer vision

--The current amount of data is sensuously speech recognition $ \ gt $ image recognition $ \ gt $ object detection (recognizes where the object is) --If you have a lot of data, you can use a simple algorithm with less hand-engineering. --Hand-engineering or hack increases when data is low

Two sources of knowledge
- labeled data
- hand-engineering features / network architecture / other components --Transfer Learning is useful when there is little data
Tips for doing well on benchmark / winning competitions
- Ensemble
  - Training several networks independently and average their output (3 ～ 15 networks)
- Multi-crop at test time
  - Run classifier on multiple versions of test images and average results (10 crops)
Use open source code
- Use architecture of network published in the literature
- Use open source implementations if possible
- Use pre-trained models and fine-tune on your data set

reference

--This week's exercise is the implementation of ResNet

reference

-Deep Learning Specialization (Coursera) Self-study record (table of contents)

Deep Learning Specialization (Coursera) Self-study record (C4W2)

Introduction

Contents

Contents

LeNet-5 (1998)

Impressions

Contents

Contents

Contents

Contents

Contents

Contents

Contents

Contents

Contents

reference

reference