I was making a decision tree using Weka, but it was troublesome to make the data arff. I created a decision tree with the Python machine learning library scikit-learn. I wanted to use it Installation of scikit-learn will be politely taught on other web sites
The decision tree object itself is fairly easy to do. Install Graphviz with brew (Mac) Because the library called pyparsing has been updated When you want to draw
sudo pip install -U pydot pyparsing==1.5.7
Please downgrade I don't understand Windows (low voice)
tree_ex.py
#-*-coding:utf-8 -*-
#Null value cannot be used → What should I do?
# yes,no is 1,-1
#Characters cannot be used
from sklearn import tree
from sklearn.externals.six import StringIO
import pydot
if __name__ == '__main__':
X = [
[0,1],
[0,-1],
[1,1]
]
Y = [1,2,3] #Corresponds in order from the top
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X,Y) #This completes the decision tree object
#Magic for drawing
dot_data = StringIO()
tree.export_graphviz(clf,out_file = dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("tree_ex.pdf")
#pre = clf.predict([0,1])
#print pre #The result is 1
X is the data and Y is the label for each data. Mass docking X and Y with fit function (maybe)
Since clf is a decision tree object and classifier when the fit function is applied. You can classify which class the new data belongs to with the commented out predict function.
After that, it should be a magic to call pydot and draw.
The above drawing result looks like this
Weka's decision tree is hard to see and I tried to make a decision tree in Python. Creating a decision tree itself is very easy. It's easy to see
・ No branching conditions (cannot be issued due to lack of ability)
・ Questionnaires with 1, 2 or 3 types of answers in one item cannot be sorted (yes and no can be realized with [1, -1])
・ Null value ・ Do not accept character strings
For now, I feel that Weka is easier to use. Should I add an argument option? .. ..
Recommended Posts