EXCEL data bar and color scale can also be done with pandas

The table of pandas is a little hard to see ... I want a color ...

I thought at one point. It is troublesome to move it to EXCEL and attach a data bar, so can I do something on Jupyter? I thought that pandas usually had such a function lol I didn't know lol I found it on the following site. https://pbpython.com/styling-pandas.html

DataFrame.style You can see the detailed function of DataFrame.style on the following page. https://pandas.pydata.org/pandas-docs/version/0.18/style.html

Practiced with the dataset boston included in sklearn

#Package import
import pandas as pd
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Data read
boston = datasets.load_boston()
boston_df=pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names #Insert column name
boston_df['PRICE'] = pd.DataFrame(boston.target)
display(boston_df)

image.png

Add a data bar like EXCEL to this

display(boston_df.style.bar())

image.png Oh, it's done. But it's kind of bloody ...

I don't like bloody, so I can choose a column and change the color. It can be customized by specifying the column name in the subset or describing it so that the bar is connected.

#Column specification
display(boston_df.style.bar(subset=['RM','AGE']))
#Column designation and color coding for each
display(boston_df.style\
        .bar(subset=['RM'],color=['lightcoral'])\
        .bar(subset=['AGE'],color=['lightgreen'])\
        .bar(subset=['TAX'],color=['lightblue']))

image.png image.png I can go!

Let's try the color scale as well.

#Creating a correlation matrix
boston_corr=boston_df.corr()
display(boston_corr)
#Color
display(boston_corr.style.background_gradient(cmap='coolwarm'))

image.png image.png I can go! (Second time) I knew that pandas could also process tables in EXCEL like this.

By the way, it is possible to output as it is as an EXCEL file.

#The openpyxl package is already installed
n_df=boston_corr.style.background_gradient(cmap='coolwarm')
n_df.to_excel('test.xlsx')

Contents of'test.xlsx'↓ image.png

It seems that it can be used in various ways to make a notebook that is easy to read!

Bonus (sparklines)

You can write a histogram in a pandas table with a package called sparklines For example, for each CHAS: Charles River (1: around the river, 0: otherwise), calculate the following average. DIS: Weighted distances from five Boston employment facilities ・ PRISE: House price

display(boston_df.groupby('CHAS')[['DIS','PRICE']].mean())

image.png Well, I understand that the area around the Charles River is closer to the employment facilities in Boston and the rent is higher. However, there are outliers and the tendency is almost the same, but the average may be higher. So, define a function

from sparklines import sparklines
#Define function
def sparkline_hist(x, bin_num=10):
    bins=np.histogram(x, bins=bin_num)[0]
    sl = sparklines(bins)[0]
    return sl

If you apply the function to groupby,

display(boston_df.groupby('CHAS')[['DIS','PRICE']].agg(['mean', sparkline_hist]))

image.png A histogram-like thing appears in the table! Apparently, there are many places that are close to the employment facilities in Boston, even if they are not around the Charles River, and the area around the Charles River seems to be a polarization of houses with fairly high rent and those that are not. You can see things on pandas.

However, for some reason, on my Jupyter, there is a part where something like a histogram is convex downward, and that remains a mystery. When I examine the contents of the sparkline_hist function, it doesn't become particularly convex, so why?

# sparkline_Validate the contents of hist function
bins=np.histogram(boston_df[boston_df['CHAS']==0]['DIS'], bins=10)[0]
sl = sparklines(bins)[0]
print(sl)
bins=np.histogram(boston_df[boston_df['CHAS']==1]['DIS'], bins=10)[0]
sl = sparklines(bins)[0]
print(sl)

image.png

By the way, when I actually wrote the histogram, it looked like the following.

fig=plt.figure(figsize=(10,5))
ax=plt.subplot(1,2,1)
ax.hist(boston_df[boston_df['CHAS']==0]['DIS'],bins=10, color='red', label='CHAS=0',alpha=0.5, density=True)
ax.hist(boston_df[boston_df['CHAS']==1]['DIS'],bins=10, color='blue', label='CHAS=1',alpha=0.5, density=True)
ax.legend(loc='upper right')
ax.set_ylabel('density')
ax.set_xlabel('DIS')
ax.set_title('DIS Hist')

ax2=plt.subplot(1,2,2)
ax2.hist(boston_df[boston_df['CHAS']==0]['PRICE'],bins=10, color='red', label='CHAS=0',alpha=0.5, density=True)
ax2.hist(boston_df[boston_df['CHAS']==1]['PRICE'],bins=10, color='blue', label='CHAS=1',alpha=0.5, density=True)
ax2.legend(loc='upper right')
ax2.set_ylabel('density')
ax2.set_xlabel('PRICE')
ax2.set_title('PRICE Hist')
plt.tight_layout()
plt.show()

image.png

sparklines was fun, but I can't think of any particularly useful uses ... lol I draw a histogram normally. Lol

that's all!

Recommended Posts

EXCEL data bar and color scale can also be done with pandas
Implement "Data Visualization Design # 3" with pandas and matplotlib
Interactively visualize data with TreasureData, Pandas and Jupyter.
Morphological analysis and tfidf (with test code) that can be done in about 1 minute
Be careful when reading data with pandas (specify dtype)
Color list that can be set with tkinter (memorial)
Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
It seems that Skeleton Tracking can be done with RealSense
I investigated the pretreatment that can be done with PyCaret
[Python] Draw elevation data on a sphere with Plotly and draw a globe that can be rotated round and round
Graph Excel data with matplotlib (1)
Graph Excel data with matplotlib (2)
Data processing tips with Pandas