What
This is an article that summarizes what I noticed and researched when learning machine learning using Chainer. This time, I will study scikit-learn.
It is written based on my understanding, so it may be incorrect. I will correct any mistakes, please forgive me.
Content
scikit-learn ** It seems that you will be able to train your model if you master this library. ** ** Model training? ?? What? It feels like, but can you understand if you read on? Data set for training? As
- We use a dataset called the Boston house prices dataset, which is created by collecting information on the living environment of 506 regions in Boston, USA, and median rent information. *
I will try using it. It seems that the median property price is predicted from the 506 sample data set and compared with the actual median.
Now, in the situation where the data is given from the Boston house prices dataset, if all this data is used for training (= model optimization), the data of 506 samples will be optimized, and it is actually unknown to the model. When trying to give data, there is no point in training if it does not match the actual situation at all. Since it is (called overfitting), it seems that some data should be used for validation of the model. ** Random allocation of data for training and testing is called the holdout method. ** ** Can be split with one of the following functions
#Split into training and test datasets
x_train, x_test, t_train, t_test = train_test_split(x, t, test_size=0.3, random_state=0)
It seems that methods to prevent overfitting have become the subject of academic research. In this library, data can be preprocessed so that the mean value of the dataset is 0 and the variance is 1.
The flow from data preprocessing to multiple regression analysis and evaluation with a decision function is You can integrate processing using pipelines
Comment I'm studying hard, so I thought I wanted to make something I played a competition called Deep racer. I wanted to appear in the AWS Deep racer.
Recommended Posts