People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning

Last time talked about forecasting stock prices using a decision tree algorithm as an example of future forecasting by machine learning. In such a situation where the next numerical value is simply predicted from the sequence of sequences that represent the latest portfolio changes, the cost is low and reasonable with a simple and clear method like a decision tree without relying on complicated algorithms. It is possible to realize prediction by accuracy.

Mechanical forecasts can be useful, for example, in short-term trading. Margin trading may be better than physical trading, as shorter ranges such as days and days are better than weeks and hours and minutes are better. If you are investing in the medium to long term, I think it is important to have a basic stance of investing in stocks with excellent fundamentals and low PER and good ROE.

[List of technical indicators](http://en.wikipedia.org/wiki/%E3%83%86%E3%82%AF%E3%83%8B%E3%82%AB%E3%83%AB% As you can see from E6% 8C% 87% E6% A8% 99% E4% B8% 80% E8% A6% A7), these ancient formulas are by no means complicated. Also, the criteria for judging the signal is not a very difficult algorithm. Taking this into consideration, I think that by applying today's machine learning, it may be possible to devise algorithms with a higher accuracy rate or to use them for system trading. For example, it may be possible that a new generation of investors armed with machine learning algorithms will enter the market one after another within five years.

Save the trained state classifier as serialized binary data

By the way, in general, in supervised machine learning, classifiers are learned by supervised data, but how are these stored? In the case of humans, the learned memory is stored in brain cells, but it is also necessary to store the learned knowledge in machines as well.

Fitting the entire teacher data every time is expensive, so if possible, I would like to use the trained instance from the next time onwards. Therefore, use the pickle module.

The Python pickle module serializes the object. This is the equivalent of the Marshal module, which behaves similarly in Ruby. By using pickle, you can save the trained instance as serialized data.

The diagram of supervised learning is as follows.

As explained in the article Last time, the machine learning library scikit-learn creates an instance based on the machine learning class and uses the .fit method. Fits (= learning) to the teacher data.

#Create an instance of machine learning(In the example below, a decision tree classifier)
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf.fit(features, labels) #Learn with teacher data

The pickle module can handle any Python object and can convert (= serialize) it as a conversion to byte data. ..

The cPickle module is a C implementation of pickle. Unlike pickle, cPickle cannot be subclassed. It's much faster than pickle, so basically we recommend using this. As a technique, if a C implementation is available, you can use it, and if it fails, you can import a regular pickle.

try:
    import cPickle as pickle
except:
    import pickle

In the next and subsequent classifications, if there is a saved object (= has a memory), it is sufficient to call up the stored knowledge and perform the classification.

#Write the instance to a file
with open(filename, 'wb') as f:
    pickle.dump(clf, f)

For example, if you have data that is updated daily and you want to use it as teacher data, you only need to fit the teacher data for that day to the loaded instance.

Also, it is a good idea to create a new instance and relearn the entire teacher data only when the knowledge is not stored.

#Load the instance only if the file exists
if os.path.exists(filename):
    with open(filename, 'rb') as g:
        clf = pickle.load(g)
else:
    #If there is no file, create a new instance and relearn

It should be noted that if you try to use an instance serialized binary on another host, it will not work properly on different architectures. Care must be taken when building an analysis platform with multiple computers. Also, when changing the version of the underlying scikit-learn library, it is better to discard the knowledge accumulated so far and start learning from scratch.

Improve generalization ability by preparing a separate classifier for each data trend

Furthermore, by applying the clustering method described earlier, multiple instances of the classifier are generated, and the instance that is suitable for it according to the data tendency. You can also use the technique of letting you predict.

Let's figure this out as well.

With unsupervised learning, K-means clustering, for example, financial data is organized with a certain degree of similarity regardless of industry. By creating an instance of the classifier for each cluster and fitting it in this way, a classifier with higher generalization ability is completed.

When clustered by K-means and trained per cluster, the number of machine findings = k.

For example, an application method such as clustering stocks with similar price movements across industry boundaries and predicting the next price movement from these experiences can be considered.

You can still use the pickle module to load / unload multiple instances in such cases.

Summary

This time, I explained the story of serializing and saving the learned state of teacher data in machine learning with pickle (= equivalent to human beings memorizing knowledge in the hippocampus of the brain). This is just an example, but I think it's an easy technique to use for scikit-learn.