For data visualization with Python and matplotlib, we have already tried pandas + various data plotting with matplotlib and pandas to quickly try to visualize data sets. , Data visualization method by matplotlib (+ pandas), etc. I have explained.
This time I will draw a heat map, but before that, let's review the visualization method again.
Data visualization here refers to a popular illustration consisting of arrays. An array is a format of data that has multiple attribute values and one data is represented by one row. There are many variations, but here are some typical visualization methods that are the main axis.
This graph is suitable for comparing the size of data. There are variations such as drawing vertically and horizontally, stacking, and arranging multiple series.
This is the default value when using the plot function with matplotlib. The data is drawn by a line connecting the points, and one series becomes one line. In particular, it is the best expression method for visualizing changes in data over time.
Also called an area chart. Like the line graph, this is easy to follow changes in time-series data, but it is especially suitable for tracking changes in data percentages and totals.
A good visualization method for looking at the correlation between two data series. The relationship between the X and Y axes is illustrated at the location of the data points. When the relationship is clear, it may be possible to grasp the correlation at a glance without having to calculate it.
The heat map shows the frequency mainly by the shade of color, and it is possible to visualize at which position there is a lot of data. You can visualize which parts of your geographic information are crowded or hot, and which parts of your website are visited or clicked.
You can use heatmaps with the matplotlib pcolor function. The following method draws a simple heatmap by passing an n x m matrix and X and Y axis labels as a matrix.
#Pass a square matrix and a matrix labeled X and Y
def draw_heatmap(data, row_labels, column_labels):
#draw
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)
ax.set_xticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.set_xticklabels(row_labels, minor=False)
ax.set_yticklabels(column_labels, minor=False)
plt.show()
plt.savefig('image.png')
return heatmap
I tried to generate random numbers of 900 data of 30 x 30. It is drawn as follows.
The shade of color is determined by the level of the value.
Then use NumPy's histgram2d.
Passing an appropriate number of hundreds of arrays to x and y, respectively, will color the corresponding points in the image.
def draw_heatmap(x, y):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.figure()
plt.imshow(heatmap, extent=extent)
plt.show()
plt.savefig('image.png')
The figure below is an example of generating and drawing about 500 normal distribution random numbers for X and Y respectively.
Heatmaps visualize data by adding a dimension called color to a two-dimensional space. This can also be used as a powerful visualization method depending on the application.
Recommended Posts