Continuing from the previous session.
A reminder about the first half of Chapter 2 of the book Machine Learning Beginning with Python. I think this article is a mess unless you are reading "Machine learning starting with Python". Let's buy it and study together!
When learning something, you must first learn the language used in that discipline or field before you can talk or discuss. As a beginner, I can't get into this road without first climbing this big wall little by little. It's a little complicated, but I'll explain the words I hear for the first time at the same time.
Supervised machine learning can be roughly divided ・ Classification ・ Regression It can be classified into (this is a classification).
The classification of iris done in Chapter 1 and the judgment of whether it is SPAM mail are also classified.
On the other hand, regression refers to machine learning that performs continuous value prediction. For example ・ Corn yield ・ Annual income forecast ・ Fluctuation of stock price Or something.
The goal of machine learning is generalize. If the prediction model created from the training data can accurately predict unknown data, the model is said to be generalized. I think it's okay if the model feels good.
Creating a model that gives more information than necessary is called overfitting. The opposite is called lack of conformity. If the amount of information is increased due to insufficient conformity, the accuracy will gradually increase, but if the amount of information exceeds a certain amount, it will become overfitted and begin to decrease. This is the model required by the model that gives the sweet spot (maximum value of accuracy) that shows the best generalization performance.
# Data set generation
X, y = mglearn.datasets.make_forge()
This time
DeprecationWarning: Function make_blobs is deprecated; Please import make_blobs directly from scikit-learn
warnings.warn(msg, category=DeprecationWarning)
A warning message will appear, but I will ignore it for the time being as it will proceed. After this, when using mglearn, a lot of warning texts will appear, but Ignore them all.
University of Wisconsin It is an excellent school that is listed in the World University Rankings and has produced Nobel laureates. If you read this book, you might be able to get along with the University of Wisconsin. "Ah, that Wisconsin guy" "Oh yeah, the one with breast cancer!" It's like that.
How to use Python and zip functions: Get multiple list elements at once See here for how to use the Zip function.
The product between features is called an interaction.
A method of referencing the k closest data and deciding the label by voting (majority vote). Since there is no weight due to distance from the context, it is possible that the label will be different from the closest one. The boundary of the class can be determined by changing the two features. This church is called the decision boundary. The larger the value of k, the smoother the decision boundary, and the smaller the value of k, the more complex the model can be. In the book, the number of k was changed to find the number of references with the highest accuracy.
k-Regression version of the nearest neighbor. This is a method of adopting the value of the closest one when k = 1 and the average value when it is 3 or 9. I think it's the simplest way anyone can think of.
Advantages: Easy to understand, fairly high accuracy without adjustment Disadvantages: Larger training sets slow down, sparse datasets perform poorly For this reason, it is rarely used in practice.
This is the least squares method. A method of adopting a parameter that minimizes the error when squared.
Suddenly it became unclear. L2 regularization is not explained too much. .. .. Overfitting and L2 regularization If you check this page, it might have been easier to imagine. in short, "Inclination" w To optimize not only for predictions on training data, but also for other constraints (← I'm not sure here) I want to make w smaller than the fitting to the training data (results obtained by the least squares method). Therefore, a penalty of sum of squares is given to the size of w to make it smaller on purpose.
The smaller the penalty, the closer it is to a linear model, If the penalty is large, w approaches 0. In other words, if you want to generalize, you can increase the penalty, but how much you should increase depends on the model.
Yup. I don't know how to say it myself.
Lasso This is a penalty for the sum of absolute values, but at that time it seems that the coefficient tends to be 0. It was written that it was used when you wanted to reduce variables. I'm not sure, so I'll investigate it in detail and write an article.
The linear model when classifying classifies according to whether the value of the function is greater than or less than 0. Logistic regression is included in regression, but it seems to be a classification algorithm. The difference between linear SVC and logistic regression was not clear.
The one-to-other (one-vs.-rest) approach was explained. It is classified by classifying it into one class and the other, and doing it for all classes. I understand what you're saying, but after all, just using scikit-learn doesn't seem to deepen your understanding. .. .. I know how to use it, so I guess it's from where I can get used to.
Recommended Posts