Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz

Introduction

I am studying using "100 Python Practice Knock". I think this book is really good because you can study data analysis using data that is close to the actual situation (although there are actually a lot of even worse data ...). In Chapter 5 of this book, there is a scene where scikit-learn is used for analysis using a decision tree. So I created a model, but I tried to visualize the tree structure using graphviz, so I tried it this time.

Target readers

--Those who are reading 100 Python practice knocks --Those who are looking for a visualization method of decision trees with scikit-learn

Target location

Python practice 100 knocks -> Chapter 5 10 knocks to predict customer withdrawal -> Knock 49: Let's check the variables that the model contributes

My pc environment

installation of graphviz

First, install the main body using homebrew.

brew install graphviz

In addition, install the library for Python using pip (anaconda seems to be able to do it with conda).

pip install graphviz

code

1. How to display on Jupyter notebook

All you have to do is add the following code. It's very easy.

from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(model, out_file=None)
graph = graphviz.Source(dot_data)
graph
スクリーンショット 2020-05-06 12.03.57.png

2. How to create a file

You can create a pdf file with just a few modifications to the last code. In this example, "test.pdf" is created in the current directory.

from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(model, out_file=None)
graph = graphviz.Source(dot_data)
graph.render('test')

test.png

3. How to create a file using the terminal (bonus)

I used the export_graphviz function of sklearn.tree to create a decision tree file in DOT language format and run system commands on Jupyter notebook.

from sklearn import tree
import graphviz
with open('test.dot', mode='w') as f:
    tree.export_graphviz(model, out_file=f)
!dot -T png test.dot -o test.png

References

This article was written with reference to the following information. See below for more information. 1.10. Decision Trees (official document) sklearn.tree.export_graphviz (also official documentation) Try the Decision Tree with Python: scikit-learn (I used it as a reference)

Recommended Posts

Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
[Python] I tried to visualize the follow relationship of Twitter
How to visualize the decision tree model of scikit-learn
I tried to visualize the spacha information of VTuber
I tried to summarize the string operations of Python
I tried to solve the 2020 version of 100 language processing knocks [Chapter 3: Regular expressions 20 to 24]
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 05-09]
I tried to find the entropy of the image with python
[TF] I tried to visualize the learning result using Tensorboard
[Python] I tried collecting data using the API of wikipedia
I tried to get the index of the list using the enumerate function
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to visualize the age group and rate distribution of Atcoder
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to extract and illustrate the stage of the story using COTOHA
I tried to verify and analyze the acceleration of Python by Cython
I tried to understand the decision tree (CART) that makes the classification carefully
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried to visualize all decision trees of random forest with SVG
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to analyze the New Year's card by myself using python
vprof --I tried using the profiler for Python
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the Datetime module by Python
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template
I tried to deliver mail from Node.js and Python using the mail delivery service (SendGrid) of IBM Cloud!
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried using the Python library "pykakasi" that can convert kanji to romaji.
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to refactor the code of Python beginner (junior high school student)
I made a script to record the active window using win32gui of Python
I tried to automatically send the literature of the new coronavirus to LINE with Python
python beginners tried to predict the number of criminals
I tried to graph the packages installed in Python
I tried to summarize the basic form of GPLVM
I tried to solve the soma cube with python
[Chapter 5] Introduction to Python with 100 knocks of language processing
I tried to approximate the sin function using chainer
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Python] I tried to graph the top 10 eyeshadow rankings
I tried using the API of the salmon data project
[Chapter 2] Introduction to Python with 100 knocks of language processing