When drawing a two-variable histogram, either a "three-dimensional graph" or "two-dimensional frequency is assigned to color or density" is used. While it is easy to understand the change in the frequency direction in the three-dimensional graph, it may be difficult to understand the entire distribution because there are hidden parts. On the other hand, when the frequency is assigned to the color or density in two dimensions, it is difficult to understand the subtle difference in the frequency direction, but it is easy to grasp what the overall distribution is.
Two two-dimensional normal distributions are used as data. The distribution is as follows.
import numpy as np
x, y = np.vstack((np.random.multivariate_normal([0, 0], [[10.0, 0],[0,20]], 5000)
,np.random.multivariate_normal([0,15], [[10.0, 0],[0, 5]], 5000))).T
2D histogram
The 2D histogram uses hist2d from matplotlib.
The frequency of the histogram is obtained as a return value.
The return values are counts, xedges, yedges, Image
.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=40, cmap=cm.jet)
ax.set_title('1st graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
The number of bins is determined by the parameter bins. If specified by scalar, the number of bins will be the same both vertically and horizontally. If you want to specify them separately, use the example. Edge can be specified as well as a one-dimensional histogram.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[40,10], cmap=cm.jet)
ax.set_title('2nd graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], cmap=cm.jet)
ax.set_title('3rd graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
If you want to normalize the histogram, set the parameter normed to True.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], normed=True, cmap=cm.jet)
ax.set_title('4th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
To change the color map, specify it in the parameter cmap as shown in the example.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], normed=True, cmap=cm.gray)
ax.set_title('5th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
You may want to specify a range of colormaps when comparing multiple histograms. In this case, use set_clim as in the example.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], normed=True, cmap=cm.jet)
ax.set_title('6th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
H[3].set_clim(0,0.05)
fig.colorbar(H[3],ax=ax)
plt.show()
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], normed=True, cmap=cm.jet)
ax.set_title('7th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
H[3].set_clim(0,0.01)
fig.colorbar(H[3],ax=ax)
plt.show()
If you want the histogram to be Log scaled, use matplotlib.colors.LogNorm ().
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], norm=matplotlib.colors.LogNorm(), cmap=cm.jet)
ax.set_title('8th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H[3],ax=ax)
plt.show()
Contour lines are written using contour. At this time, pay attention to the following points
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
counts, xedges, yedges, Image= ax.hist2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)], norm=matplotlib.colors.LogNorm(), cmap=cm.jet)
ax.contour(counts.transpose(),extent=[xedges.min(),xedges.max(),yedges.min(),yedges.max()])
ax.set_title('8th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(Image,ax=ax)
plt.show()
When you want to make the shape of the bottle hexagonal.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hexbin(x,y, gridsize=20, extent=[-30, 30, -30, 30], cmap=cm.jet)
ax.set_title('8th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(H,ax=ax)
plt.show()
The histogram is Blocky, especially when the bins are widely spaced. There is a kernel density estimation as a method to connect this smoothly. This is for estimating the probability density function from the sample distribution of random variables.
Kernel density estimation is included in scipy and scikit-learn. scipy kernel density estimation specifies the band width normalized by the standard deviation. As a result, even if the distribution of data changes, it can be estimated smoothly. Therefore, when the parameter bw_method = 1.0, value.std (ddof = 1) is used as the band width. (value is data) Here ddof is divided by N-ddof when calculating the standard deviation with delta degrees of freedom.
With kernel = gaussian_kde (value)
, the kernel is a gaussian_kde object, so if you really want to get the value,
Pass the positions of the x and y coordinates like kernel (positions)
.
Actually, a mesh is created with mgrid, x and y are made one-dimensional separately with ravel, and they are attached with vstack and then passed to the kernel.
Since the value returned by kernel (positions)
is one-dimensional, it is made two-dimensional by reshape.
Finally, it is displayed as a graph with contourf.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
import matplotlib.cm as cm
xx,yy = np.mgrid[-30:30:1,-30:30:1]
positions = np.vstack([xx.ravel(),yy.ravel()])
value = np.vstack([x,y])
kernel = gaussian_kde(value)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(xx,yy,f, cmap=cm.jet)
ax.set_title('11th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
import matplotlib.cm as cm
xx,yy = np.mgrid[-30:30:1,-30:30:1]
positions = np.vstack([xx.ravel(),yy.ravel()])
value = np.vstack([x,y])
kernel = gaussian_kde(value, bw_method=0.5)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(xx,yy,f, cmap=cm.jet)
ax.set_title('12th graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
You can create a histogram using numpy's histogram2d. In this case, the frequency and the edges of x and y can be obtained, so you need to graph yourself. Here, imshow was used for display. At this time, the histogram data contains data in the x-axis direction in the vertical direction and data in the y-axis direction in the horizontal direction, similar to hist2d in matplotlib. It is transposed and the starting point is set to lower left.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111)
H = np.histogram2d(x,y, bins=[np.linspace(-30,30,61),np.linspace(-30,30,61)])
im = ax.imshow(H[0].T, interpolation='nearest', origin='lower', extent=[-30,30,-30,30], cmap=cm.jet)
ax.set_title('13st graph')
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.colorbar(im, ax=ax)
plt.show()
Recommended Posts