In matplotlib, the vertical axis of the histogram
--Frequency (default of matplotlib) --Relative frequency --Relative frequency density
And draw.
normed of matplotlib.hist is strange behavior of matplotlib: histogram normed Statistics (2) Use python to learn the probability density function (normal distribution, standard normal distribution)!
The formula is
--Frequency density = frequency / class width --Relative frequency density = Relative frequency / class width
It seems. When I experimented with the Python code below, it seems that the vertical axis is the relative frequency density when "density = True" is specified in the hist function. The hist function also has a normed option (deprecated), but it seems that the density option removes this bug.
Draw three histograms. The number of data is 10,000, and it is a normal random number with an average value of 50 and a standard deviation of 10.
#%% md
#%%
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
#%%
#Data creation
μ = 50
σ = 10
data = [ np.random.normal(μ, σ) for i in range(10000) ]
#%%
#Number of classes
num_bins = 20
#Class width
bin_width = (max(data) - min(data)) / num_bins
print(f"Class width=about{bin_width}")
#Graph drawing
fig = plt.figure(figsize=(8, 24))
# (1)Histogram with frequency on the vertical axis
ax1 = fig.add_subplot(311)
ax1.title.set_text("(1) frequency")
ax1.grid(True)
ax1.hist(data, bins=num_bins)
# (2)Histogram with relative frequency on the vertical axis
ax2 = fig.add_subplot(312)
ax2.title.set_text("(2) relative frequency")
ax2.grid(True)
ax2.set_xlim(ax1.get_xlim())
weights = np.ones_like(data) / len(data)
ax2.hist(data, bins=num_bins, weights=weights)
# (3)Histogram with relative frequency density on the vertical axis(Blue)& Normal distribution probability density function(Red)
ax3 = fig.add_subplot(313)
ax3.title.set_text("(3) density")
ax3.grid(True)
ax3.set_xlim(ax1.get_xlim())
ax3.hist(data, bins=num_bins, density=True, color="blue", alpha=0.5)
x = np.arange(0, 100, 1)
y = norm.pdf(x, μ, σ)
ax3.fill_between(x, y, color="red", alpha=0.5)
ax3.plot(x, y, 'k', linewidth=3, color="red", alpha=0.5)
When I ran the code, it said "class width = about 3.718313197105561" and the following histogram was drawn. Each of the three histograms has (1) vertical axis = frequency, (2) vertical axis = relative frequency, and (3) vertical axis = relative frequency density histogram (blue) with normal distribution probability density function (red) superimposed. Is displayed. Since the histogram of "density = True" and the probability density function overlap, it seems that the area of the entire histogram becomes 1 when "density = True" is set.
Use the highest column of the histogram for validation.
The highest pillar in (2) is a value between 0.14 and 0.15.
When I ran the code, it said about 3.7.
Relative frequency / class width = 0.145 / 3.7 ≒ 0.039 It looks like it matches the highest pillar in (3). (End of verification)
It is difficult to read the above histogram and add up, so set the class number (num_bins variable) of the Python code to 1 and re-execute.
The height of the only pillar in (2) is now 1.0. Also, since we used 10000 data, the height of the only pillar in (1) is also 10000. The histogram (blue) of "density = True" in (3) also looks the same as the area of the probability density function (red) (area = 1). (End of verification)
Recommended Posts