Deep Learning Specialization (Coursera) Self-study record (C1W2)

Introduction

This is the content of Course 1, Week 2 (C1W2) of Deep Learning Specialization.

(C1W2L01) Binary Classification

Contents

--Explanation of binary classification in the case of judging "whether it is a cat" from image data --Explanation of notation (meaning of symbol) -$ X ; 1 data feature in the row direction ( n_x ), training example ( m $ pieces) in the column direction ($ X \ in \ mathbb {R} ^ {n_x \ times m} $)

Impressions

--The meaning of the rows and columns of $ X $ has changed compared to the case of the Machine Learning lecture.

(C1W2L02) Logistic Regression

Contents

--Predicted value $ \ hat {y} = P (y = 1 | x) $ (probability of $ y = 1 $) --Define the parameter $ w \ in \ mathbb {R} ^ {n_x} $, $ b \ in \ mathbb {R} $ -$ \ hat {y} = \ sigma (w ^ T x + b) $; sigmoid function

Impressions

――The symbols are different here as well as in Machine Learning. Don't use $ x_0 ^ {(i)} = 1 $. Do not include the constant term $ b $ in $ w $.

(C1W2L03) Logistic Regression Cost Function

Contents

(C1W2L04) Gradient Descent

Contents

--Intuitive explanation of gradient descent -$ \ frac {\ partial J (w, b)} {\ partial w} $ is often written as dw in programs. -$ \ frac {\ partial J (w, b)} {\ partial b} $ is often written as db in programs.

(C1W2L05) Derivatives

Contents

--A brief explanation of differentiation

Impressions

――Since it is basic content, you can watch the video at 1.75 times.

(C1W2L06) More Derivatives Example

Contents

--A brief explanation of differentiation

Impressions

――Since it is basic content, you can watch the video at 1.75 times.

(C1W2L07) Computation Graph

Contents

-When $ J (a, b, c) = 3 \ (a + bc ) $, decompose as $ u = bc $, $ v = a + u $, $ J = 3v $ Illustrate how to calculate

(C1W2L08) Derivatives With Computation Graph

Contents

--Explanation of differentiation ($ \ frac {dJ} {da} = \ frac {dJ} {dv} \ frac {dv} {da} $) while using Computational Graph

(C1W2L09) Logistic Regression Gradient Descent

Contents

--Explanation of the derivative of the loss $ L \ (a, y ) $ of logistic regression

(C1W2L10) Gradient Descent on m Example

Contents

--Explanation of how to differentiate cost function $ J \ (w, b ) $ and apply it to the steepest descent method when the number of samples is $ m $. --As explained in the for loop, vectorization is important because the for loop is inefficient.

(C1W2L11) Vectorization

Contents

-Explanation of the concept of vectorization using $ w ^ T x $ of $ z = w ^ T x + b $ as an example --Demonstrated the calculation time of for loop and vector calculation (z = np.dot (w, x) + b) on Jupyter notebook. 300 times different

I also compared the time with for loop and vector calculation.

vectorization.py


import numpy as np
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a, b)
toc = time.time()

print(c)
print("Vectorization version:" + str(1000*(toc-tic)) + "ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc = time.time()

print(c)
print("for loop:" + str(1000*(toc-tic)) + "ms")

The result. There was a difference of less than 700 times, 12ms for vectorization and 821ms for for loop.

249840.57440415953
Vectorization version:12.021541595458984ms
249840.57440415237
for loop:821.0625648498535ms

(C1W2L12) More Vectorization Examples

Contents

--Neural network programming guideline; ** Whatever possible, avoid explicit for-loop / Avoid for-loop as much as possible **

example.py


import numpy as np

u = np.dot(A, v) #Product of matrix and vector

u = np.exp(v) #Let exp act on each element
u = np.log(v) #Let log work on each element
u = np.abs(v) #Element by element abs(Absolute value)To act
u = np.maximum(v, 0) #Elements below 0 should be 0
u = v ** 2 #Square for each element
u = 1/v #Reciprocal for each element

(C1W2L13) Vectorizing Logistics Regression

Contents

--Vectorize Logistics regression calculations

X = \left[x^{(1)} \ x^{(2)} \cdots \ x^{(m)}\right] \ (X \in \mathbb{R}^{n_x \times m}) \\
Z = \left[z^{(1)} \ z^{(2)} \cdots \ z^{(m)}\right] \ (Z \in \mathbb{R}^m ) \\
A = \left[a^{(1)} \ a^{(2)} \cdots \ a^{(m)}\right] \ (A \in \mathbb{R}^m ) \\
Z = w^T X + \left[b \ b \ \cdots b \right] \\
A = \mathrm{sigmoid}\left( Z \right) \ (\mathrm{sigmoid} \Implement the function properly)

--In Python, `Z = np.dot (w.T, X) + b``` ( `b``` is automatically converted to a column vector of [1, m])

(C1W2L14) Vectorizing Logistics Regression's Gradient Computation

Contents

--Explanation of vectorization of differential calculation of logistics regression

db = \frac{1}{m} \cdot \mathrm{np.sum}(Z) \\
dw = \frac{1}{m} \cdot X\ dZ^T

Impressions

--A mixture of ordinary mathematical expressions and Python code. You may understand it while listening to the class, but it may be difficult to understand if you look back at it later.

(C1W2L15) Broadcasting in Python

Contents

--Description of Python broadcast -When you add the (m, n) matrix and the (1, n) matrix, the (1, n) matrix automatically becomes the (m, n) matrix. -When you add the (m, n) matrix and the (m, 1) matrix, the (m, 1) matrix automatically becomes the (m, n) matrix. --See NumPy broadcast documentation for details --The bsxfun function in Matlab / Octave is a little different (?)

example.py


>>> import numpy as np
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> b = np.array([100, 200, 300, 400])
>>> a + b
array([[101, 202, 303, 404],
       [105, 206, 307, 408]])

(C1W2L16) A Note on Python/numpy vectors

Contents

--The flexibility of Python / NumPy is both an advantage and a disadvantage --Even if you add the row vector and the column vector, no error will occur and some calculation result will be obtained, so it is difficult to find the error. -** Do not use arrays of size `` `(n,) ``` (do not use rank 1 arrays) **

example.py


>>> import numpy as np
>>> a = np.random.rand(5) #Rank 1 array
>>> print(a)
[0.4721318  0.73582028 0.78261299 0.25030022 0.69326545]
>>> print(a.T)
[0.4721318  0.73582028 0.78261299 0.25030022 0.69326545] #The display does not change even if you change places
>>> print(np.dot(a, a.T)) #I'm calculating the inner product, but I'm not sure whether to calculate the inner product or the outer product.
1.9200902050946715
>>>
>>> a = np.random.rand(5, 1) # (5, 1)Matrix
>>> print(a) #Row vector
[[0.78323543]
 [0.18639053]
 [0.45103025]
 [0.48060903]
 [0.93265189]]
>>> print(a.T)
[[0.78323543 0.18639053 0.45103025 0.48060903 0.93265189]] #Column vector after landing
>>> print(np.dot(a, a.T)) #Correctly calculate the product of row and column vectors
[[0.61345774 0.14598767 0.35326287 0.37643002 0.73048601]
 [0.14598767 0.03474143 0.08406777 0.08958097 0.17383748]
 [0.35326287 0.08406777 0.20342829 0.21676921 0.42065422]
 [0.37643002 0.08958097 0.21676921 0.23098504 0.44824092]
 [0.73048601 0.17383748 0.42065422 0.44824092 0.86983955]]

--If you don't know the dimension, enter assert (a.shape == (5, 1)) `` `etc. --An array of rank 1 is explicitly reshaped as a = a.reshape ((5,1)) `` `

Impressions

――It's important here because I often lost track of the size of the matrix when I took Machine Learning.

(C1W2L17) Quick tour of Jupyter/ipython notebooks

Contents

--Explanation of how to use Jupyter / ipython notebook when taking Coursera

(C1W2L18) Explanation of Logistics Regression Cost Function (Optional)

Contents

--(Re) description of cost function of logistics regression

Impressions

--Honestly, I don't understand much: -p --$ L (\ hat {y}, y) = --y \ log \ hat {y}-(1-y) \ log (1- \ hat {y}) It's more intuitive for me to talk from $ It was easy to understand

reference

-Deep Learning Specialization (Coursera) Self-study record (table of contents)

Recommended Posts

Deep Learning Specialization (Coursera) Self-study record (C3W1)
Deep Learning Specialization (Coursera) Self-study record (C1W3)
Deep Learning Specialization (Coursera) Self-study record (C4W3)
Deep Learning Specialization (Coursera) Self-study record (C1W4)
Deep Learning Specialization (Coursera) Self-study record (C2W1)
Deep Learning Specialization (Coursera) Self-study record (C1W2)
Deep Learning Specialization (Coursera) Self-study record (C3W2)
Deep Learning Specialization (Coursera) Self-study record (C2W2)
Deep Learning Specialization (Coursera) Self-study record (C4W1)
Deep Learning Specialization (Coursera) Self-study record (C2W3)
Deep Learning Specialization (Coursera) Self-study record (C4W2)
Learning record
Learning record # 3
Learning record # 1
Learning record # 2
Deep Learning
Learning record of reading "Deep Learning from scratch"
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Learning record so far
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Go language learning record
Deep learning / activation functions
Deep Learning from scratch
Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Deep learning 1 Practice of deep learning
Deep learning / cross entropy
Learning record 5 (9th day)
Learning record 6 (10th day)
First Deep Learning ~ Preparation ~
Programming learning record day 2
First Deep Learning ~ Solution ~
Learning record 8 (12th day)
[AI] Deep Metric Learning
Learning record 1 (4th day)
Learning record 7 (11th day)
I tried deep learning
Python: Deep Learning Tuning
Learning record 2 (6th day)
Deep learning large-scale technology
Linux learning record ① Plan
Learning record 16 (20th day)
Learning record 22 (26th day)
Deep learning / softmax function
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
"Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial