Google translated http://scikit-learn.org/0.18/modules/model_persistence.html [scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)

3.4. Model persistence

After training the scikit-learn model, a method of sustaining the model for future use without re-learning is desirable. The next section shows an example of how to persist a model with pickle. We also identify some security and maintainability issues when working with pickle serialization.

3.4.1. Persistence example

It is possible to save scikit models using Python's built-in persistence module, pickle:

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0:1])
array([0])
>>> y[0]
0

In certain cases of scikit, it may be more interesting to use the pickle replacement for joblib (joblib.dump & joblib.load). This is more efficient for scikit-learn evaluator objects (which often have large numpy arrays internally). However, there is no dumps method, so you can only save to disk.

>>> from sklearn.externals import joblib
>>> joblib.dump(clf, 'filename.pkl')

You can later load the pickled model (perhaps in another Python process):

>>> clf = joblib.load('filename.pkl')

** Note: ** The joblib.dump and joblib.load functions also accept objects like files instead of filenames. For more information on data persistence in Joblib, see here (https://pythonhosted.org/joblib/persistence.html).

3.4.2. Security and maintainability limits

pickle (and joblib extensions) have some maintainability and security issues. For this reason, --Don't decrypt untrusted data as it can execute malicious code when loaded. --Models saved using one version of scikit-learn may be loaded by another version of scikit-learn, but this is not fully supported and is not recommended. It should also be noted that the operations performed on such data can have different and unexpected results.

To rebuild a similar model in a future version of scikit-learn, you will need to save additional metadata along with the pickled model.

--Reference to invariant snapshots of training data --Python source code used to generate the model --scikit-learn and its dependency version --Cross-validation score obtained from training data

This makes it possible to ensure that the cross-validation score is in the same range as before. If you want to know more about these issues or find out about other possible serialization methods, Alex Gaynor's Story See -software).

[scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)

[Translation] scikit-learn 0.18 User Guide 3.4. Model persistence

3.4. Model persistence

3.4.1. Persistence example

3.4.2. Security and maintainability limits