Determine the number of classes using the Starges formula

Sturges' rule

A formula that gives an indication of the frequency distribution and the number of classes when creating a histogram. It can be calculated by the following formula, where n is the number of samples and k is the number of classes.

k = 1 + log_2N

Example

Assuming that there is data with 40 samples (N = 40), the number of classes when creating a histogram is calculated from it.

1 + log_240 = 6.3219280948874 ≒ 6

From this, the number of classes 6 is set.

Caution

The number of classes obtained using the Starges formula is only a ** guideline **. (There is no absolute answer for setting the class number when creating a frequency distribution table / histogram)

Methodization in Python

sturges.py


import math

def sturges_rule(n):
    u"""
Star Jess Official
    """
    return round(1 + math.log2(n))

Check in the "Example" above.

>>> from sturges import sturges_rule
>>> sturges_rule(40)
6

reference

Recommended Posts

Determine the number of classes using the Starges formula
10. Counting the number of lines
Get the number of digits
Calculate the number of changes
Get the number of views of Qiita
Calculation of the number of Klamer correlations
Get the number of Youtube subscribers
Count / verify the number of method calls.
How to find out the number of CPUs without using the sar command
Angle correction (projection conversion) of the license using OpenCV-Automatically determine the binarization threshold-
Verify the accuracy of the scoring formula "RC" using actual professional baseball data
Graph the change in the number of keyword appearances per month using pandas
Count the number of characters with echo
[Python] Automatically totals the total number of articles posted by Qiita using the API
An introduction to data analysis using Python-To increase the number of video views-
Align the number of samples between classes of data for machine learning with Python
Output the number of CPU cores in Python
Estimating the effect of measures using propensity scores
Check the type of the variable you are using
Organize the meaning of methods, classes and objects
Calculate the total number of combinations with python
Divide the string into the specified number of characters
Find the number of days in a month
Minimize the number of polishings by combinatorial optimization
Find the geometric mean of n! Using Python
I tried using the image filter of OpenCV
Check the status of your data using pandas_profiling
[Python] Determine the type of iris with SVM
Scraping the winning data of Numbers using Docker
[SIR model analysis] Transform the formula to determine γ and the effective reproduction number R ♬
Calculation of the shortest path using the Monte Carlo method
python beginners tried to predict the number of criminals
How to know the port number of the xinetd service
[Python] A program that counts the number of valleys
Explanation of the concept of regression analysis using python Part 2
Projecet Euler 12 Find the number of divisors without division.
How to get the number of digits in Python
Automatically determine and process the encoding of the text file
relation of the Fibonacci number series and the Golden ratio
Cut a part of the string using a Python slice
Count the number of parameters in the deep learning model
Calculation of the minimum required number of votes from turnout
Determine the threshold using the P tile method in python
Try to estimate the number of likes on Twitter
Drawing on Jupyter using the plot function of pandas
The pain of gRPC using Python. November 2019. (Personal memo)
Predict the number of people infected with COVID-19 with Prophet
Explanation of the concept of regression analysis using Python Part 1
I tried using the API of the salmon data project
Manage the package version number of requirements.txt with pip-tools
Let's analyze the emotions of Tweet using Chainer (2nd)
Study from the beginning of Python Hour8: Using packages
[Python] Get the number of views of all posted articles
Let's analyze the sentiment of Tweet using Chainer (1st)
The story of using circleci to build manylinux wheels
Visualize the number of complaints from life insurance companies
Clustering G-means that automatically determines the number of clusters
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future