Various Python visualization tools

Purpose

A summary of various visualization tools for improving the efficiency of analysis competitions. Gradually increase!

table of contents

  1. Correlation map
  2. Confusion Matrix
  3. LightGBM feature importance

1. Correlation map

Heatmap display of correlation for each column of pandas data frame. It is used for the correlation of each feature and the correlation of the prediction result for the model ensemble.

reference

code

fig ,ax = plt.subplots(1,1,figsize=(12,12))
sns.heatmap(df.corr(), annot=True, fmt='.7f', ax=ax)
df.corr()

2. Confusion Matrix

reference

code

import numpy as np
import pandas as pd
from scipy import signal
from sklearn.metrics import confusion_matrix, f1_score, plot_confusion_matrix

# Thanks to https://www.kaggle.com/marcovasquez/basic-nlp-with-tensorflow-and-wordcloud
def plot_cm(y_true, y_pred, title="", figsize=(14,14):
    y_pred = y_pred.astype(int)
    cm = confusion_matrix(y_true, y_pred, labels=np.unique(y_true))
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    cm_perc = cm / cm_sum.astype(float) * 100
    annot = np.empty_like(cm).astype(str)
    nrows, ncols = cm.shape
    for i in range(nrows):
        for j in range(ncols):
            c = cm[i, j]
            p = cm_perc[i, j]
            if i == j:
                s = cm_sum[i]
                annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
            elif c == 0:
                annot[i, j] = ''
            else:
                annot[i, j] = '%.1f%%\n%d' % (p, c)
    cm = pd.DataFrame(cm, index=np.unique(y_true), columns=np.unique(y_true))
    cm.index.name = 'Actual'
    cm.columns.name = 'Predicted'
    fig, ax = plt.subplots(figsize=figsize)
    plt.title(title)
    sns.heatmap(cm, cmap='viridis', annot=annot, fmt='', ax=ax)

3. LightGBM feature importance

Visualization of feature importance of LightGBM learning results

code

def display_feature_importance(models):
    fi = pd.DataFrame(columns=['importance','feature'])
    for i, m in enumerate(models):
        df_t = pd.DataFrame(columns=['importance','feature'])
        df_t['importance'] = m.feature_importance(importance_type='gain')
        df_t['feature'] = m.feature_name()

        fi = pd.concat([fi, df_t], axis=0)
    fi = fi.groupby('feature').sum() 
    best_features = fi.sort_values(by='importance', ascending=False).reset_index()

    plt.figure(figsize=(16, 16));
    sns.barplot(x="importance", y="feature", data=best_features);
    plt.title('LGB Features (avg over folds)');
    print('worst:\n',best_features['feature'][-20:].values)

Recommended Posts

Various Python visualization tools
# 3 [python3] Various operators
Python Application: Data Visualization Part 3: Various Graphs
Visualization memo by Python
Python Data Visualization Libraries
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]
Refactoring tools for Python
Various load test tools
Logistics visualization with Python
Various processing of Python
HoloViews may become the standard for Python visualization tools
Delete various whitespace characters [Python]
Various Python built-in string operations
About various encodings of Python 3
Python application: data visualization # 2: matplotlib
Manipulate various databases with Python
Python
[ns3-30] Enable visualization of Python scripts
Proper use of Python visualization packages
5 Easy-to-Use Python Tools | Increase Work Efficiency
Easy visualization using Python but PixieDust
Clustering and visualization using Python and CytoScape
Easy data visualization with Python seaborn.
Python application: data visualization part 1: basic
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
[Python] Chapter 04-06 Various data structures (creating dictionaries)
[Python] Speeding up processing using cache tools
Python visualization tool for data analysis work
Mosaic images in various shapes (Python, OpenCV)
[Python] Chapter 04-03 Various data structures (multidimensional list)
[Python] Chapter 04-04 Various data structures (see list)
[Python] Various data processing using Numpy arrays
[Python] Chapter 04-02 Various data structures (list manipulation)
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
Various format specifications of str.format () method of Python3
[Python] Chapter 04-07 Various data structures (dictionary manipulation)
Recommendation of Altair! Data visualization with Python
Text mining with Python ② Visualization with Word Cloud
Python Tools for Visual Studio Installation Guide
Tips for making small tools in python
[Python] Various combinations of strings and values
[Python] I searched for various types! (Typing)