I made a method to automatically select and visualize an appropriate graph for pandas DataFrame

Introduction

I often wonder which graph to use when visualizing data. Therefore, last time, I summarized the graphs suitable for each type of explanatory variable and objective variable (Visualization method of data by explanatory variable and objective variable). However, I thought that I was writing "I'll forget this soon!" Therefore, I created a method that automatically determines the type of variable and draws a suitable graph.

Last summary

The appropriate seaborn methods for each type of explanatory variable and objective variable (discrete quantity or not) are as follows. For details, please refer to the previous article from the above link.

Method content

Below is the code for my own method.

import matplotlib.pyplot as plt
import seaborn as sns

def visualize_data(data, target_col):
    
    for key in data.keys():
        
        if key==target_col:
            continue
            
        length=10
        subplot_size=(length, length/2)
        
        if is_categorical(data, key) and is_categorical(data, target_col):

            fig, axes=plt.subplots(1, 2, figsize=subplot_size)
            sns.countplot(x=key, data=data, ax=axes[0])
            sns.countplot(x=key, data=data, hue=target_col, ax=axes[1])
            plt.tight_layout()
            plt.show()

        elif is_categorical(data, key) and not is_categorical(data, target_col):

            fig, axes=plt.subplots(1, 2, figsize=subplot_size)
            sns.countplot(x=key, data=data, ax=axes[0])
            sns.violinplot(x=key, y=target_col, data=data, ax=axes[1])
            plt.tight_layout()
            plt.show()

        elif not is_categorical(data, key) and is_categorical(data, target_col):

            fig, axes=plt.subplots(1, 2, figsize=subplot_size)
            sns.distplot(data[key], ax=axes[0], kde=False)
            g=sns.FacetGrid(data, hue=target_col)
            g.map(sns.distplot, key, ax=axes[1], kde=False)
            axes[1].legend()
            plt.tight_layout()
            plt.close()
            plt.show()

        else:

            sg=sns.jointplot(x=key, y=target_col, data=data, height=length*2/3)
            plt.show()

The is_categorical is as follows.

def is_categorical(data, key):
    
    col_type=data[key].dtype
    
    if col_type=='int':
        
        nunique=data[key].nunique()
        return nunique<6
    
    elif col_type=="float":
        return False
    
    else:
        return True

The outline is

-Pass the data you want to visualize (pandas.DataFrame) to data and the key of the objective variable to target_col. -Use the is_categorical method to determine whether the explanatory variable and objective variable are discrete or continuous, and visualize them with the appropriate seaborn method.

It has become. When the data type is int, if there are 6 or more types of values, it is a continuous quantity, and if there are only 5 or less types of values, it is a discrete quantity. To be honest, there is room for improvement in the judgment here.

Application

Apply it to titanic data (only one copy because the result is long).

import pandas as pd

data=pd.read_csv("train.csv")
data=data.drop(["PassengerId", "Name", "Ticket", "Cabin"], axis=1) #Excludes eigenvalues

visualize_data(data, "Survived")

I was able to automatically draw an appropriate graph for each type!

At the end

In the previously posted Method to get an overview of data with Pandas and GitHub raised. Please use it! I want to automate various preprocessing in the future.