I thought at one point. It is troublesome to move it to EXCEL and attach a data bar, so can I do something on Jupyter? I thought that pandas usually had such a function lol I didn't know lol I found it on the following site. https://pbpython.com/styling-pandas.html
DataFrame.style You can see the detailed function of DataFrame.style on the following page. https://pandas.pydata.org/pandas-docs/version/0.18/style.html
Practiced with the dataset boston included in sklearn
#Package import
import pandas as pd
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Data read
boston = datasets.load_boston()
boston_df=pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names #Insert column name
boston_df['PRICE'] = pd.DataFrame(boston.target)
display(boston_df)
Add a data bar like EXCEL to this
display(boston_df.style.bar())
Oh, it's done. But it's kind of bloody ...
I don't like bloody, so I can choose a column and change the color. It can be customized by specifying the column name in the subset or describing it so that the bar is connected.
#Column specification
display(boston_df.style.bar(subset=['RM','AGE']))
#Column designation and color coding for each
display(boston_df.style\
.bar(subset=['RM'],color=['lightcoral'])\
.bar(subset=['AGE'],color=['lightgreen'])\
.bar(subset=['TAX'],color=['lightblue']))
I can go!
Let's try the color scale as well.
#Creating a correlation matrix
boston_corr=boston_df.corr()
display(boston_corr)
#Color
display(boston_corr.style.background_gradient(cmap='coolwarm'))
I can go! (Second time) I knew that pandas could also process tables in EXCEL like this.
By the way, it is possible to output as it is as an EXCEL file.
#The openpyxl package is already installed
n_df=boston_corr.style.background_gradient(cmap='coolwarm')
n_df.to_excel('test.xlsx')
Contents of'test.xlsx'↓
It seems that it can be used in various ways to make a notebook that is easy to read!
You can write a histogram in a pandas table with a package called sparklines For example, for each CHAS: Charles River (1: around the river, 0: otherwise), calculate the following average. DIS: Weighted distances from five Boston employment facilities ・ PRISE: House price
display(boston_df.groupby('CHAS')[['DIS','PRICE']].mean())
Well, I understand that the area around the Charles River is closer to the employment facilities in Boston and the rent is higher. However, there are outliers and the tendency is almost the same, but the average may be higher. So, define a function
from sparklines import sparklines
#Define function
def sparkline_hist(x, bin_num=10):
bins=np.histogram(x, bins=bin_num)[0]
sl = sparklines(bins)[0]
return sl
If you apply the function to groupby,
display(boston_df.groupby('CHAS')[['DIS','PRICE']].agg(['mean', sparkline_hist]))
A histogram-like thing appears in the table! Apparently, there are many places that are close to the employment facilities in Boston, even if they are not around the Charles River, and the area around the Charles River seems to be a polarization of houses with fairly high rent and those that are not. You can see things on pandas.
However, for some reason, on my Jupyter, there is a part where something like a histogram is convex downward, and that remains a mystery. When I examine the contents of the sparkline_hist function, it doesn't become particularly convex, so why?
# sparkline_Validate the contents of hist function
bins=np.histogram(boston_df[boston_df['CHAS']==0]['DIS'], bins=10)[0]
sl = sparklines(bins)[0]
print(sl)
bins=np.histogram(boston_df[boston_df['CHAS']==1]['DIS'], bins=10)[0]
sl = sparklines(bins)[0]
print(sl)
By the way, when I actually wrote the histogram, it looked like the following.
fig=plt.figure(figsize=(10,5))
ax=plt.subplot(1,2,1)
ax.hist(boston_df[boston_df['CHAS']==0]['DIS'],bins=10, color='red', label='CHAS=0',alpha=0.5, density=True)
ax.hist(boston_df[boston_df['CHAS']==1]['DIS'],bins=10, color='blue', label='CHAS=1',alpha=0.5, density=True)
ax.legend(loc='upper right')
ax.set_ylabel('density')
ax.set_xlabel('DIS')
ax.set_title('DIS Hist')
ax2=plt.subplot(1,2,2)
ax2.hist(boston_df[boston_df['CHAS']==0]['PRICE'],bins=10, color='red', label='CHAS=0',alpha=0.5, density=True)
ax2.hist(boston_df[boston_df['CHAS']==1]['PRICE'],bins=10, color='blue', label='CHAS=1',alpha=0.5, density=True)
ax2.legend(loc='upper right')
ax2.set_ylabel('density')
ax2.set_xlabel('PRICE')
ax2.set_title('PRICE Hist')
plt.tight_layout()
plt.show()
sparklines was fun, but I can't think of any particularly useful uses ... lol I draw a histogram normally. Lol
that's all!
Recommended Posts