Introduction

This is the content of Course 3, Week 2 (C3W2) of Deep Learning Specialization.

(C3W2L01) Carry Out Error Analysis

Error analysis
- Get \sim 100 mislabeled dev set examples --Count up how many are dogs (I'm talking about cat classifiers, so choosing a dog photo will result in an error) ――If the case where the image of a dog is mistakenly judged as a “cat” is 5/100, even if the improvement related to the dog is made, the performance will not be greatly improved. If 50/100, there is a big improvement
Evaluate multiple ideas in parallel
- Ideas for cat detection
  - Fix pictures of dogs being recognized as cats
  - Fix great cats (lions, panthers, etc.) being misrecognized
  - Improve performance in blurry image
  - ... --Check the cause while looking at the image and focus on the large area

Image	Dogs	Great cat	Blurry	...
1	x
2		x
3		x	x
...
% of total	8%	48%	61%	...

(C3W2L02) Cleaning up Incorrectly labeled data

--If the training set label is incorrect - DL algorithm are quite robust to random errors in the training set --systematic error has an effect (for example, if all white dogs are labeled "cats") --In case of Dev / Test set, perform error analysis. If the impact is large, correct it.

Image	Dogs	Great cat	Blurry	...	Incorrectly labeled
1	x
2		x			x
3		x	x
...
% of total	8%	48%	61%	...	5%

Correcting incorrect dev/test set example --Apply same process to your dev and test set to make sure they continue to come from the same distribution (dev set and test set are the same distribution) --Consideration examining examples your algorithm got right as well as ones it got wrong (Check not only the data that the algorithm made a mistake in judgment, but also the data that made a correct judgment) --Train and dev / test data may now come from slightly different distribution (The distribution of Train and dev / test data may be slightly different)

(C3W2L03) Build First System Quickly, Then Iterate

Set up dev/test set and metrics
Build initial system quickly
Use Bias/Variance analysis & error analysis to prioritize next step ――It's easy to think too hard and create a complicated system from the beginning.

comment

--Black after 5 minutes and 30 seconds

(C3W2L04) Training and testing on different distribution

--I want to develop an algorithm for classifying images in Mobile App ――But the collected data is 200k for the image (high pixel) from the web page and 10k for the mobile app data. What to do with train / dev / test data at this time? --Option 1; Add both to make 210k data and shuffle. - trainin set ; 205k - dev set ; 2.5k - test set ; 2.5k --Advantages; same distribution --Disadvantages; dev / test set is mostly web page data, not much mobile app data

Option 2
- train set ; web page 200k + mobile app 5k
- dev set ; mobile app 2.5k
- test set ; mobile app 2.5k --Advantages; The distribution of dev / test set is the same as the aim of the algorithm (image classification of mobile app) --Disadvantages; The distribution of train sets is different --Option 1 is better than Option 1

(C3W2L05) Bias and Variance with mismatched data distribution

reference

-Deep Learning Specialization (Coursera) Self-study record (table of contents)

Deep Learning Specialization (Coursera) Self-study record (C3W2)

Introduction

Contents

Contents

Contents

comment

Contents

Contents

reference