Visualization memo by pandas, seaborn

Data set visualization

A memo when practicing visualization with pandas and seaborn using ʻiris.csv` as a sample data set. Since it is a memo for myself, I think that there are arbitrary parts such as the type of figure and how to select columns, but please understand _ (._.) _


Histogram drawing

ʻIris.csvhas 4 columns and 1 category value It consists ofsepal_length, sepal_width, petal_length, peta_widthandspecies. Visualize with the classification of the category value species` in mind.


First, check the distribution of one column.

・ Distribution of sepal_length

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

sns.distplot(df.sepal_length,kde = True)


Next, the distribution of the four columns was drawn on four separate graphs. I thought it would be convenient to specify layout = (2,2) using the plot () method of DataFrame and output 4 graphs in a 2 * 2 square layout, but with a histogram I don't know how to display the density function by kernel density estimation at the same time.

・ Distribution of sepal_length, sepal_width, petal_length, peta_width

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

df.plot(kind="kde",subplots=True,layout=(2,2))    #kind="hist"Histogram with


・ Distribution of sepal_length by category

Check how the distribution of sepal_length differs between setosa and versicolor.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv



Drawing a scatterplot matrix

Scatterplot matrices are a useful visualization method for overviewing the data (I think). In Seaborn, you can easily draw using pairplot (). In the following example, hue =" species " is set as an argument of pairplot (). This will color-code the iris dataset by type of category value " species ". If diag_kind =" kde " is set, the density function by kernel density estimation is drawn for the diagonal component. If nothing is specified, the histogram is simply displayed.

・ Distribution of sepal_length by category

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

#pairplot:Draw a scatterplot matrix
g = sns.pairplot(df,hue = "species",diag_kind="kde")


