Create bins with NumPy, get data-bin correspondence

In the process of dividing the data into bins and drawing the histogram etc., if you pass array to hist () of matplotlib, it will be divided into bins that look good according to the data and drawn, but the data is divided into arbitrary bins. You may want to know what data is in which bin.

This time, we will create a bin and get the correspondence between the data and the bin.

Creating a distribution

First, create an array with an appropriate distribution.

import numpy

n = 100
dist = numpy.random.normal(0, 1, n)

Creating a bin

There seems to be some debate about the validity of the number of bins, but in Microsoft Excel etc., it seems that the number of bins k is the square number of n for the number of data n as standard.

k=\sqrt{n}

This time, we will use this method to determine the number of bins and create an array as bins that divides the range of data into k.

import math

bin_num = math.sqrt(n)
bins = numpy.linspace(min(dist), max(dist), bin_num)

An array similar to the following was created.

[-2.28875045 -1.72785426 -1.16695807 -0.60606188 -0.0451657   0.51573049 1.07662668  1.63752287  2.19841906  2.75931524]

Verification

Let's see how the data is plotted using the created bin.

import matplotlib.pyplot as plt

plt.hist(dist, bins=bins) 
plt.show()

binning.png

Get the correspondence between data and bins

You can get a list of bin location information corresponding to the data with numpy.digitize ().

bin_indice = numpy.digitize(dist, bins)

For the following results, dist [0] corresponds to the 4th bin and dist [1] corresponds to the 5th bin.

[ 4  5  8  6  4  6  8  1  6  6  8  2  6  3  5  4  5  4  5  3  8  2  5  5  4 4  4  4  2  3  5  6  5  3  4  3  7  6  4  3  4  4  8  2  4  4  8  6  6  3 6  2  9  5  5  4  4  1  8  6  5  5  5  5  4  1 10  3  1  8  7  3  4  3  8 2  6  5  6  3  6  7  5  3  3  5  5  5  4  1  3  6  5  6  7  3  4  7  8  4]

I will try to attach it with zip ().

bin_data_map = zip(dist, bin_indice)
[(-0.16840296791127732, 4), (0.43715458127052381, 5), (1.8635306330264274, 8), (0.89273121368100206, 6),...

Recommended Posts

Create bins with NumPy, get data-bin correspondence
[Python] Create structured array (store heterogeneous data with NumPy)
Moving average with numpy
Create games with Pygame
Create filter with scipy
Get started with MicroPython
Get Tweets with Tweepy
Get date with python
Getting Started with Numpy
Get started with Mezzanine
Learn with Cheminformatics NumPy
Matrix concatenation with Numpy
Hamming code with numpy
Regression analysis with NumPy
Extend NumPy with Rust
Let's create a PRML diagram with Python, Numpy and matplotlib.