I want to go into the bushes of Deep Forest, not the darkness of Deep Learning.
An algorithm that may be an alternative to DNN is Deep Forest. Read the article Deep Forest: Toward the Alternative of Deep Neural Network, or [Paper](https://arxiv.org/pdf/1702.08835. When I read pdf), it seems that it has a deep structure by using multiple collections of decision trees called random forests and arranging them in many directions in the width and depth directions.
For Random Forest, this article will be helpful. It is an explanation of random forest in a machine learning tool called scikit-learn which is a module of python, but since the code to be run this time is also Python and scikit-learn is used, it is insanely helpful.
There seems to be an implementation in R language other than Python. (Deep Forest implementation code example)
Creating a Deep Forest from scratch is hard. I'm getting lost. So, get the Deep Forest implemented in Python from github.
https://github.com/leopiney/deep-forest
From learning to testing, the README works well. The correct answer rate seems to be reasonable and not bad. By the way, since it only supports CPU, the CPU usage rate will be quite high.
I'm hungry to experience it all, but it's a little inconvenient that I can't save the learned model, so I'll add the following two member functions to the MGCForest class in deep_forest.py.
deep_forest.py
class MGCForest():
:
:
:
def save_model(self):
# save multi-grained scanner
for mgs_instance in self.mgs_instances:
stride_ratio = mgs_instance.stride_ratio
folds = mgs_instance.folds
for i, estimator in enumerate(mgs_instance.estimators):
joblib.dump(estimator, 'model/mgs_submodel_%.4f_%d_%d.pkl' % (stride_ratio, folds, i + 1))
# save cascade forest
for n_level, one_level_estimators in enumerate(self.c_forest.levels):
for i, estimator in enumerate(one_level_estimators):
joblib.dump(estimator, 'model/cforest_submodel_%d_%d.pkl' % (n_level + 1, i + 1))
def load_model(self):
# load multi-grained scanner
for mgs_instance in self.mgs_instances:
stride_ratio = '%.4f' % mgs_instance.stride_ratio
folds = mgs_instance.folds
for i in range(len(mgs_instance.estimators)):
model_name = 'model/mgs_submodel_%s_%d_%d.pkl' % (stride_ratio, folds, i + 1)
print('load model: {}'.format(model_name))
mgs_instance.estimators[i] = joblib.load(model_name)
# load cascade forest
model_files = glob.glob('model/cforest_submodel_*.pkl')
model_files.sort()
max_level = 0
model_dict = dict()
for model_name in model_files:
model_subname = re.sub('model/cforest_submodel_', '', model_name)
model_level = int(model_subname.split('_')[0])
if max_level < model_level:
max_level = model_level
if model_level not in model_dict.keys():
model_dict[model_level] = list()
print('load model: {}'.format(model_name))
model_dict[model_level].append(joblib.load(model_name))
self.c_forest.levels = list()
for n_level in range(1, max_level + 1):
self.c_forest.levels.append(model_dict[n_level])
n_classes_ = self.c_forest.levels[0][0].n_classes_
self.c_forest.classes = np.unique(np.arange(n_classes_))
If you call the save_model function after training with the fit function, the model parameters will be saved in the model directory (please create the model directory and empty the contents before calling the save_model function). If you want to load the trained model parameters, you can call the load_model function.
Deep Forest is created by multiple random forests, but when saving a model, it is necessary to create and save a parameter file for each random forest. So there are multiple pkl files in the model directory.
Originally, Random Forest has the advantage that it is a model that is not affected by differences in the range of values for each feature. Neural networks do not, so you need to normalize the range of values for each feature from 0 to 1. Therefore, I hope that this Deep Forest will greatly inspire its power when you want to combine not only images but also various other features.
Recommended Posts