[Memo] Difference between test_split and cross-validation method when verifying generalized performance in deep learning

I'm new to Python / deep learning. About the verification method of generalization performance investigated at the time of neural network implementation Leave a note as follows.

I was wondering

--There is a generalization performance verification method called k-fold cross-validation / kCV (Reference 1). --A method of dividing the learning data into k pieces, using k-1 pieces for learning, and using one piece for performance evaluation, and repeating the learning k times. --I originally knew that using sklearn.modelselection.traintest_split (TTS), the data at hand could be divided into training data and test data to verify generalization performance. ――A. Basically, is it okay to recognize that TTS is repeated multiple times as kCV? —— b. Is it okay to recognize that kCV can more accurately evaluate model generalization than TTS?

Answer

――A. I think so. In addition, at the time of kCV, all k divisions can be used for verification without exception. --b. That seems to be the case. --If TTS is used only once, the data used for verification can never be used as training data, so there is a possibility that unnecessary bias will occur in learning depending on the selection method of verification data. With kCV, you can overcome it (Reference 2). --KCV also says, "If there is a bias in the data within each of the k divisions, the learning result will be biased (the division that contains only dog data and the division that contains only cat data data). (Divided, etc.) ”Disadvantages have been pointed out. Countermeasures against this include stratified k-validated cross-validation (Stratified kCV) (Reference 1).

Summary

It's quite natural, but I'll leave it as a memo.

reference

--Reference 1: I tried to sort out the types of cross validation. --Reference 2: KFolds Cross Validation vs train_test_split

Recommended Posts

[Memo] Difference between test_split and cross-validation method when verifying generalized performance in deep learning
List concatenation method in python, difference between list.extend () and “+” operator
Difference between java and python (memo)
Difference between list () and [] in Python
Difference between == and is in python
[Python] Difference between function and method
Difference between @classmethod and @staticmethod in Python
Difference between append and + = in Python list
Difference between nonlocal and global in Python
[Python] Difference between class method and static method
Difference in writing method to read external source code between Ruby and Python
[python] Difference between variables and self. Variables in class
About the difference between "==" and "is" in python
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Difference between Ruby and Python in terms of variables
Difference between return, return None, and no return description in Python
(Important inner product in deep learning.) About the relationship between inner product, outer product, dot product, and numpy dot function.