I often draw histograms with Python's matplotlib, but sometimes I don't like the vertical and horizontal axes, so this is a memo for fine-tuning it.
As an example, I will show you the following graph.
%matplotlib inline
import matplotlib.pyplot as plt
from scipy import stats
norm_rvs = stats.norm.rvs(loc=50, scale=30, size=100, random_state=0)
plt.hist(norm_rvs, bins=10, alpha=0.5, ec='navy')
plt.show()
Look at this
――Um, it feels bad because the breaks in the histogram bars are halfway! ――Um, it feels bad if the scale on the vertical axis is not an integer!
That's why.
You can get information about the bar breaks and heights in the histogram by doing the following:
Y, X, _ = plt.hist(norm_rvs, bins=10, alpha=0.5, ec='navy')
print(X)
print(Y)
plt.show()
[-26.58969448 -12.12146116 2.34677216 16.81500548 31.2832388
45.75147212 60.21970544 74.68793876 89.15617208 103.6244054
118.09263872]
[ 1. 5. 7. 13. 17. 18. 16. 11. 7. 5.]
Let's use that information to make the vertical axis an integer.
import numpy as np
Y, X, _ = plt.hist(norm_rvs, bins=10, alpha=0.5, ec='navy')
y_max = int(max(Y)) + 1
plt.yticks(np.arange(0, y_max, 2)) #It is hard to see even if it is in 1 increments, so make it in 2 increments.
plt.show()
Specify the range on the horizontal axis and adjust the number of bins nicely.
Y, X, _ = plt.hist(norm_rvs, bins=13, alpha=0.5, ec='navy', range=(-10, 120))
print(X)
print(Y)
y_max = int(max(Y)) + 1
plt.yticks(np.arange(0, y_max, 2))
plt.show()
[-10. 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. 110. 120.]
[ 3. 5. 6. 10. 11. 9. 15. 13. 9. 6. 5. 5. 2.]
Now, you may want to compare multiple histograms side by side.
norm_rvs2 = stats.norm.rvs(loc=75, scale=55, size=100, random_state=0)
plt.hist(norm_rvs, bins=10, alpha=0.5, ec='navy')
plt.hist(norm_rvs2, bins=10, alpha=0.5, ec='red')
plt.show()
It feels bad like this! It tends to be. Let's make this feel good as well.
bins = 20
range=(-50, 200)
Y1, X1, _ = plt.hist(norm_rvs, bins=bins, alpha=0.5, ec='navy', range=range)
Y2, X2, _ = plt.hist(norm_rvs2, bins=bins, alpha=0.5, ec='red', range=range)
y_max = int(max(max(Y1), max(Y2))) + 1
plt.yticks(np.arange(0, y_max, 2))
plt.show()
Personally, I prefer to arrange them vertically as follows.
bins = 20
range=(-50, 200)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(8,8))
Y1, X1, _ = axes[0].hist(norm_rvs, bins=bins, alpha=0.5, ec='navy', range=range)
Y2, X2, _ = axes[1].hist(norm_rvs2, bins=bins, alpha=0.5, ec='red', range=range)
y_max = int(max(max(Y1), max(Y2))) + 1
axes[0].set_ylim([0, y_max])
axes[1].set_ylim([0, y_max])
axes[0].set_yticks(np.arange(0, y_max, 2))
axes[1].set_yticks(np.arange(0, y_max, 2))
plt.show()
That's all from the scene!
Recommended Posts