Last time, we evaluated the performance of homogeneous data, so this time we trained on variable data.
The data used is the same as last time, mnist
Last time, 2000 samples were extracted for each character,
This time, 1100, 1300, 1500, 1700, 1900, 2100, 2300, 2500, 2700, 2900 samples were extracted in order from 0.
The test data is also 10000 homogeneous data.
Variables to change
--Number of trees --Exploration depth --Number of features
3 types.
First, the number of trees is changed to four types: 10, 100, 1000, and 10000.
The result is shown below
Even if you look at the value with the best accuracy last time was about 0965 ?, the accuracy has decreased slightly, but the tendency is the same.
I think it's enough to have about 1000
Next, about the search for depth
This is learned by changing from 2 to 20 as before.
The number of trees is 1000, the number of features is sqrt (features)
The result is shown below
This is also the same as the last time, the accuracy is almost the same, and overfitting does not occur even if you search deeply.
Finally features
Change from 10 to 55
The number of trees is 1000, depth is fixed at max
Since the time of sqrt is 28, is it better to use less than that this time?
However, since the order difference is 0.001, it may be said that there is no big difference if it is 20 or more.
Finally the result of SVM for comparison
C = 1.0, gamma = 1/784 in RBF kernel
After all, Ramdom Forest is more accurate, but
The accuracy of SVM is higher than last time ...?
It's possible considering that it's sampling randomly,
Considering that the accuracy of Random Forest was reduced by about 0.05,
Perhaps SVM is more resistant to data variation ...?
MNIST is too accurate to be evaluated very much ...
Recommended Posts