scikit-learn is almost the de facto machine learning library for Python. The advantage of scikit-learn is that many algorithms are implemented, but it is designed in a consistent manner and can handle various algorithms in a common way. If you implement a new algorithm that scikit-learn does not have, or if you implement it so that it can be treated like other sciki-learn estimators when using other libraries, it will be cross-validated like the originally implemented estimator. You can evaluate performance and optimize parameters with grid search. Here is the minimum estimator implementation. Here, we consider discriminators or regressionrs as targets (not clustering or unsupervised learning).
from sklearn.base import BaseEstimator
class MyEstimator(BaseEstimator):
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2
def fit(self, x, y):
return self
def predict(self, x):
return [1.0]*len(x)
def score(self, x, y):
return 1
def get_params(self, deep=True):
return {'param1': self.param1, 'param2': self.param2}
def set_params(self, **parameters):
for parameter, value in parameters.items():
setattr(self,parameter, value)
return self
Inherit sklearn.base.BaseEstimator to define the estimator class. Please rewrite the contents of the method as appropriate.
Cross validation:
x = [[2,3],[4,5],[6,1],[2,0]]
y = [0.0,9.4,2.1,0.9]
estimator = MyEstimator()
cross_validation.cross_val_score(estimator,x,y,cv=3)
Result:
array([ 1., 1., 1.])
Grid search:
gs = grid_search.GridSearchCV(estimator, {'param1': [0,10], 'param2': (1, 1e-1, 1e-2)})
gs.fit(x,y)
gs.best_estimator_, gs.best_params_, gs.best_score_
Result:
(MyEstimator(), {'param1': 0, 'param2': 1}, 1.0)
cross_validation
In order to perform cross_validation, you need a fit method that learns training data and ascore method that inputs test data, compares the value estimated from it with the correct answer value, and outputs a score.
fit(self, x, y)
It is a function that learns so that the output is y for the input x.
predict(self, x)
A function whose output returns y_pred for input x. You don't need predict if you just want to do cross_validation, but in most cases you will call predict inside score. By implementing only predict by inheriting sklearnbase.ClassifierMixin and scikit-learn.base.RegressionMixin multiple times, you can use the implemented score function.
score(self, x, y)
It is a function that estimates the output y_pred for the input x, compares the y_pred with the correct answer y, and returns the score (whether the error or label matches, etc.).
grid_search
In order to do grid_search, we need to manipulate parameters in addition to learning and calculating the score as defined above. Implement the method get_params to get the data-independent parameters and the method set_params to set the parameters.
get_params(self, deep=True)
In the get_params method, the parameter key is the attribute name. Try to return a dictionary where value is a value.
set_params(self, **parameters)
This is a parameter setter. Pass it in a dictionary like get_params.
MixinImplemented methods can be used by multiple inheritance of sklearn.base.ClassifierMixin for discriminative model and sklearn.base.RegressorMixin for regression model.
If you inherit these
sklearn.base.ClassifierMixin_estimator_type to classifier or regressorscore method. Since the predict method is called in the score method, the predict method needs to be implemented.sklearn.base.RegressorMixin_estimator_type to regressorscore method. Since the predict method is called in the score method, the predict method needs to be implemented.You can check if your estimator is compatible with sklearn with sklearn.utils.estimator_checks.check_estimator. By the way, in the sample shown in this article, I get an error that the input is not validated. There should be no problem if you use it yourself.
--Create your own estimator class by inheriting sklearn.base.BaseEstimator
--You need fit, score methods to do cross_validation
--In order to do grid_search, you need more get_params and set_params methods.
--If you define ClassifierMixin or RegressorMixin, you can use the score method to calculate the score using the predict you implemented.
Most of what I wrote here API Reference for sklearn.base Module Information for developers on the official website Is referred to.
Recommended Posts