Learning record 9 (13th day)

Learning record (13th day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): Completed on Thursday, December 19th ・ Progate Python course (5 courses in total): Ends on Saturday, December 21st -** Andreas C. Müller, Sarah Guido "(Japanese title) Machine Learning Beginning with Python" (O'Reilly Japan, 2017) **: Saturday, December 21-

Chapter 1 Introduction

-In order to apply the machine learning model to new data, it must be generalized well. ・ Generally, about 25% of the data is assigned to the test set. • Observe the data first to see if it is necessary to use machine learning and if it contains the required data. -There is a method of observing by creating a pair plot etc. by scatter matrix of pandas.

Chapter 2 Supervised Learning

-It can be roughly divided into two types: classification and regression. If there is continuity, it can be said to be the latter. -Search for sweet spots that show the best generalization performance in the trade-off between underfitting and overfitting.

K-Nearest Neighbors Classifier

-Find the closest point from the training dataset. · Good baseline for small data -In many cases, it shows sufficiently high performance without much adjustment. Use it as a baseline before using more advanced technology. -However, it does not work well with a dataset with a large number of features (hundreds or more), and performance deteriorates with a sparse dataset where most of the features are 0 in many cases.

Linear model

-Predict using the linear function of the input features. (Image of drawing a line so that it is closest to each data) -Very effective when having a large number of features. Algorithms to try first • Signs of overfitting if performance is significantly different between training set and test set On the contrary, if it is too close, it is a sign of insufficient conformity. ・ Ridge: One of the regressions by linear model. Strong constraints and less risk of overfitting. High generalization performance. -Lasso: An image that automatically selects features. For example, when it is expected that there are many features but few with a high degree of judo. -Scikit-learn also has an ElasticNet class that combines the above two. -Logistic Regression: Linear model for classification -Linear support vector machine (linearSVM): Same as above

Naive Bayesian classifier

-A type of classifier that closely resembles a linear model. The feature is that training is fast. Can only be used for classification. -Useful as a baseline model for large datasets where even a linear model takes time.

Decision Tree

-Widely used for classification and regression tasks. ・ Learn a hierarchical structure consisting of questions that can be answered with Yes / No. (Is the feature a larger than b? Etc. It feels like an akinator?) ・ Visualization is possible and easy to explain. Very fast. ・ If the depth of the decision tree is not constrained, it will be as deep and complicated as possible, which tends to induce overfitting and reduce generalization performance. -You can visualize the tree with export_graphviz of the tree module. ・ Estimate the characteristic value of behavior from the feature importance, etc. However, there are cases where even unused features are simply not adopted in the decision tree.

Decision tree ensemble method (Ensembles)

・ A method to build a more cooperative model by combining multiple machine learning models

Random Forest

• One of the ways to deal with the problems of decision trees that are overfitting to training data. ・ The most commonly used machine learning method for both regression and classification -Not suitable for high-dimensional sparse data. ・ The degree of overfitting can be reduced by making many decision trees that are overfitted in different directions and taking the average. -Bootstrap sample: Randomly restore and extract data points. Make a decision tree with the completed new dataset. Select a feature subset while controlling it with max_fearture.

Gradient Boosting

-Make the mistakes of the previous decision tree in order so that the next decision tree corrects them. ・ Combining a large number of weak learners. -As long as the parameters are set correctly, the performance is better than Random Forest. Training takes time. -Similar to Random Forest, it does not work very well for high-dimensional and sparse data such as sentences. ・ Parameters such as learning_rate, n_estimator, and max_depth are important. ・ The first thing to try is Random Forest (because it is more robust) If the predicted time is very important, or if you want to narrow down the performance to the last 1%, try this. -Refer to the xgboost package and python interface when applying to large problems.

Support vector machine using kernel method

-An extension of linear SVM to enable more complex models. -Powerful for medium-sized datasets consisting of features with similar meanings. Sensitive to parameters. -Linear models in low dimensions are very restrictive because straight lines and hyperplanes limit flexibility. To make it more flexible, we use the interaction (product) of input features and polynomial terms. Gaussian kernel: Computes all polynomials up to a specific degree of the original feature. Polynomial kernel Radial basis function (RBF) • Only specific training data located at the boundary between the two classes determines the decision boundary. These data points are called support vectors. (Origin of the name) -Differences in the details of features have a destructive effect on SVM. As a method to solve this, there is a method called Min-Max Scaler as one of the methods of converting so that all of them have almost the same scale. (Put it between 0 and 1) -The strength is that complex decision boundaries can be generated even when the data has only a small amount of features. -The problem is that data preprocessing and parameter adjustment need to be done carefully. This is why many apps use decision tree-based models such as gradient boosting. In addition, it is difficult to verify and understand the reason why a certain prediction was made, and it is difficult to explain to non-experts. However, it is worth trying SVMwo for the results of measuring instruments with similar features such as camera pixels. -The parameters gamma (reciprocal of the width of the Gaussian kernel) and C (regularization parameter) are important.

Neural network (deep learning)

・ About multilayer perceptron (MLP) -Effective for particularly large data sets. Sensitive to parameters. Training takes time. ・ Input weighting for output is important in MLP -On the way from input to output, there is an intermediate processing step (hidden units) that calculates the weighted sum, and the weighted sum is further performed on the value calculated here and the result is output. To. -Since the process up to this point is mathematically the same as calculating one weighted sum, a nonlinear function is applied to the result in order to make this model stronger than linear. In many cases, relu (rectified unit: rectified linear function) and tanh (hyperbolic tangent: hyperbolic tangent function) are used. -By default, MLP uses 100 hidden layers, but it needs to be changed according to the size of the dataset. -Similar to SVM, it is necessary to convert the data scale. I use a standard scaler in writing.

ConvergenceWarning: #Convergence warning:
 Stochastic Optimizer: Maximum iterations reached and the optimization
 hasn't converged yet. #Probabilistic optimizer:The number of iterations has reached the upper limit, but the optimization has not converged

-The above is the function of the adam algorithm used for model learning. It means that the number of learning repetitions should be increased. There is a possibility that generalization performance will be improved by strengthening regularization for weights by changing the alpha parameter. ・ If you want to handle more flexible and large models, you should use keras, lasgana, and tensor-flow. ・ Sufficient calculation time, data, and careful parameter adjustment often outperform other machine learning algorithms. But this is also a drawback, and big and powerful ones are very time consuming. Also, parameter tuning is a technique in itself. -The decision tree model has better performance for data that is not homogeneous and has various types of features. -The number of hidden layers and the number of hidden units per layer are the most important parameters.


Finished until [Chapter 2 Supervised Learning (p.126)]

Recommended Posts

Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Learning record 2 (6th day)
Learning record 16 (20th day)
Learning record 22 (26th day)
Learning record No. 21 (25th day)
Learning record 13 (17th day) Kaggle3
Learning record No. 10 (14th day)
Learning record 12 (16th day) Kaggle2
Learning record No. 24 (28th day)
Learning record No. 23 (27th day)
Learning record No. 25 (29th day)
Learning record No. 26 (30th day)
Learning record No. 20 (24th day)
Learning record No. 14 (18th day) Kaggle4
Learning record No. 15 (19th day) Kaggle5
Learning record 11 (15th day) Kaggle participation
Programming learning record day 2
Learning record No. 17 (21st day)
Learning record
Learning record No. 18 (22nd day)
Learning record # 3
Learning record # 1
Learning record # 2
Learning record No. 19 (23rd day)
Learning record No. 29 (33rd day)
Learning record No. 28 (32nd day)
Learning record No. 27 (31st day)
Python learning day 4
Learning record (2nd day) Scraping by #BeautifulSoup
Learning record so far
Go language learning record
Learning record (4th day) #How to get the absolute path from the relative path
Linux learning record ① Plan
Effective Python Learning Memorandum Day 15 [15/100]
<Course> Deep Learning: Day2 CNN
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Subjects> Deep Learning: Day3 RNN
Rabbit Challenge Deep Learning 2Day
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Effective Python Learning Memorandum Day 4 [4/100]
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping
Learning record (6th day) #Set type #Dictionary type #Mutual conversion of list tuple set #ndarray type #Pandas (DataFrame type)
Thoroughly study Deep Learning [DW Day 0]