Previous article Can machine learning predict parallelograms? (1) Can it be extrapolated?
I thought I wrote the previous article, but I didn't write much about programming techniques. However, since it is calculated by pyhton, the tag is set to python. Many people are doing machine learning with python, so posting to qiita is fine, isn't it? .. ..
This calculation is done with scikit-learn
.
Well, in the previous article I found that extrapolation was not possible. So what about the following problems?
a | b | Angle c | |
---|---|---|---|
For learning | 0~50 and 1000~1100 | 0~100 | 0~90 |
interpolation? Extrapolation? | 150~900 | 50 | 45 |
It's a simple matter. Is it possible to predict the length a of the base from 0 to 50 and 1000 to 1100, and to predict 150 to 900 in the meantime?
Lasso for linear regression is probably predictable. But what about random forests and neural networks? With this dataset, I'm wondering if I can predict from 0 to 1100. Actual data seems to have such an example.
The result. First, the coefficient of determination and graph of the learning result.
Coefficient of determination | Learning | test |
---|---|---|
Lasso regression | 0.686 | 0.661 |
Random forest | 0.999 | 0.975 |
neural network | 0.997 | 0.997 |
Lasso Random forest neural network The value of the base a is set to a small value and a large value, but since b and the angle c are randomly created, the area area of the area appears to be connected.
Looking at the graph, I don't know why the two outliers of Random Forest came out, but looking at these graphs and the coefficient of determination, I feel that I don't need to choose Lasso, and Random Forest. The best! The best neural network! is what I think.
Then, what happens when we predict the value during learning at the base a? is. Let's take a look at the graph for the time being. Lasso Random forest neural network
I put the numerical value of the base a during learning, but the neural network is completely useless. Looking at this, regression prediction in deep learning is dangerous unless it is predicted by interpolation firmly. So is Random Forest. It is necessary to firmly identify the interpolation and extrapolation.
As you can see, Lasso can predict with close values. If there is any risk of extrapolation, linear regression may be a good choice.
It may be that.
Next, let's consider the descriptor design.