Google translated http://scikit-learn.org/0.18/modules/model_persistence.html [scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)
After training the scikit-learn model, a method of sustaining the model for future use without re-learning is desirable. The next section shows an example of how to persist a model with pickle. We also identify some security and maintainability issues when working with pickle serialization.
It is possible to save scikit models using Python's built-in persistence module, pickle:
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0:1])
array([0])
>>> y[0]
0
In certain cases of scikit, it may be more interesting to use the pickle replacement for joblib (joblib.dump
& joblib.load
). This is more efficient for scikit-learn evaluator objects (which often have large numpy arrays internally). However, there is no dumps
method, so you can only save to disk.
>>> from sklearn.externals import joblib
>>> joblib.dump(clf, 'filename.pkl')
You can later load the pickled model (perhaps in another Python process):
>>> clf = joblib.load('filename.pkl')
** Note: ** The joblib.dump and joblib.load functions also accept objects like files instead of filenames. For more information on data persistence in Joblib, see here (https://pythonhosted.org/joblib/persistence.html).
pickle (and joblib extensions) have some maintainability and security issues. For this reason, --Don't decrypt untrusted data as it can execute malicious code when loaded. --Models saved using one version of scikit-learn may be loaded by another version of scikit-learn, but this is not fully supported and is not recommended. It should also be noted that the operations performed on such data can have different and unexpected results.
To rebuild a similar model in a future version of scikit-learn, you will need to save additional metadata along with the pickled model.
--Reference to invariant snapshots of training data --Python source code used to generate the model --scikit-learn and its dependency version --Cross-validation score obtained from training data
This makes it possible to ensure that the cross-validation score is in the same range as before. If you want to know more about these issues or find out about other possible serialization methods, Alex Gaynor's Story See -software).
[scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)
© 2010 --2016, scikit-learn developers (BSD license).