2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)

** Here, let's first go through an example of a classified tree. ** **

⑴ Import library

#A class that builds a decision tree model
from sklearn.tree import DecisionTreeClassifier
#Module based on decision tree model
from sklearn import tree

#Package of dataset for machine learning
from sklearn import datasets
#Utility for splitting data
from sklearn.model_selection import train_test_split

#Module to display images in Notebook
from IPython.display import Image  
#Module for visualizing decision tree model
import pydotplus

⑵ Data acquisition and reading

iris = datasets.load_iris()
Variable name meaning Note Data type
1 sepal length Sepal length Continuous amount(cm) float64
2 sepal width Sepal width Continuous amount(cm) float64
3 petal length Petal length Continuous amount(cm) float64
4 petal width Petal width Continuous amount(cm) float64
5 species Type Setosa=1, Versicolour=2, Virginica=3 int64
#Label of explanatory variable

#Explanatory variable shape

#Show the first 5 lines of the explanatory variable
iris.data[0:5, :]


#Objective variable label

#Shape of objective variable

#Show objective variable


(3) Data preprocessing

#Store explanatory variables and objective variables respectively
X = iris.data
y = iris.target

#Separated for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

⑷ Model construction and evaluation of decision trees

#Initialize the class that builds the decision tree model
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)

#Generate decision tree model
model = clf.fit(X_train, y_train)

#Calculate the correct answer rate for each training and test
print('Correct answer rate(train):{:.3f}'.format(model.score(X_train, y_train)))
print('Correct answer rate(test):{:.3f}'.format(model.score(X_test, y_test)))


⑸ Drawing a tree diagram

  1. ** Convert decision tree model to DOT data **
  2. ** Draw a diagram from DOT data **
  3. ** Convert to png and display in Notebook **
#Convert decision tree model to DOT data
dot_data = tree.export_graphviz(model,                              #Specify decision tree model
                                out_file = None,                    #Specifies to return a string instead of an output file
                                feature_names = iris.feature_names, #Specify the display name of the feature amount
                                class_names = iris.target_names,    #Specify the display name of the classification
                                filled = True)                      #Color nodes in the majority class

#Draw a diagram
graph = pydotplus.graph_from_dot_data(dot_data)  

#View diagram


How to read a tree diagram



#Export to png file
graph.write_png("iris.png ")

#Download from google colaboratory
from google.colab import files


