scikit-learn is almost the de facto machine learning library for Python. The advantage of scikit-learn is that many algorithms are implemented, but it is designed in a consistent manner and can handle various algorithms in a common way. If you implement a new algorithm that scikit-learn does not have, or if you implement it so that it can be treated like other sciki-learn estimators when using other libraries, it will be cross-validated like the originally implemented estimator. You can evaluate performance and optimize parameters with grid search. Here is the minimum estimator implementation. Here, we consider discriminators or regressionrs as targets (not clustering or unsupervised learning).
from sklearn.base import BaseEstimator
class MyEstimator(BaseEstimator):
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2
def fit(self, x, y):
return self
def predict(self, x):
return [1.0]*len(x)
def score(self, x, y):
return 1
def get_params(self, deep=True):
return {'param1': self.param1, 'param2': self.param2}
def set_params(self, **parameters):
for parameter, value in parameters.items():
setattr(self,parameter, value)
return self
Inherit sklearn.base.BaseEstimator
to define the estimator class. Please rewrite the contents of the method as appropriate.
Cross validation:
x = [[2,3],[4,5],[6,1],[2,0]]
y = [0.0,9.4,2.1,0.9]
estimator = MyEstimator()
cross_validation.cross_val_score(estimator,x,y,cv=3)
Result:
array([ 1., 1., 1.])
Grid search:
gs = grid_search.GridSearchCV(estimator, {'param1': [0,10], 'param2': (1, 1e-1, 1e-2)})
gs.fit(x,y)
gs.best_estimator_, gs.best_params_, gs.best_score_
Result:
(MyEstimator(), {'param1': 0, 'param2': 1}, 1.0)
cross_validation
In order to perform cross_validation, you need a fit
method that learns training data and ascore
method that inputs test data, compares the value estimated from it with the correct answer value, and outputs a score.
fit(self, x, y)
It is a function that learns so that the output is y
for the input x
.
predict(self, x)
A function whose output returns y_pred
for input x
. You don't need predict
if you just want to do cross_validation
, but in most cases you will call predict
inside score
. By implementing only predict
by inheriting sklearnbase.ClassifierMixin
and scikit-learn.base.RegressionMixin
multiple times, you can use the implemented score
function.
score(self, x, y)
It is a function that estimates the output y_pred
for the input x
, compares the y_pred
with the correct answer y
, and returns the score (whether the error or label matches, etc.).
grid_search
In order to do grid_search
, we need to manipulate parameters in addition to learning and calculating the score as defined above. Implement the method get_params
to get the data-independent parameters and the method set_params
to set the parameters.
get_params(self, deep=True)
In the get_params
method, the parameter key is the attribute name. Try to return a dictionary where value is a value.
set_params(self, **parameters)
This is a parameter setter. Pass it in a dictionary like get_params
.
Mixin
Implemented methods can be used by multiple inheritance of sklearn.base.ClassifierMixin
for discriminative model and sklearn.base.RegressorMixin
for regression model.
If you inherit these
sklearn.base.ClassifierMixin
_estimator_type
to classifier
or regressor
score
method. Since the predict
method is called in the score
method, the predict
method needs to be implemented.sklearn.base.RegressorMixin
_estimator_type
to regressor
score
method. Since the predict
method is called in the score
method, the predict
method needs to be implemented.You can check if your estimator is compatible with sklearn with sklearn.utils.estimator_checks.check_estimator
. By the way, in the sample shown in this article, I get an error that the input is not validated. There should be no problem if you use it yourself.
--Create your own estimator class by inheriting sklearn.base.BaseEstimator
--You need fit
, score
methods to do cross_validation
--In order to do grid_search
, you need more get_params
and set_params
methods.
--If you define ClassifierMixin
or RegressorMixin
, you can use the score
method to calculate the score using the predict
you implemented.
Most of what I wrote here API Reference for sklearn.base Module Information for developers on the official website Is referred to.
Recommended Posts