[Statistics for programmers] There is more than one way to calculate the average value

table of contents

Statistics for Programmers-Table of Contents

Related article

-Mean / Median / Mode

Overview

When calculating the average value, I think that the general method is to add all the values and divide by the number of elements. However, there are several other average values, such as weighted average and geometric mean, which can be used according to the purpose. The commonly known method of dividing the sum by the number of elements is called the arithmetic mean.

--Type of average value --Arithmetic mean

Below, we will explain all the calculation methods.

Arithmetic mean

It is a well-known value added and divided by the number of data.

\bar{x}=\frac{x_1+x_2+x_3+・ ・ ・+x_n}{n}

When calculating the average value from the frequency distribution table, use the class value (representative value) and frequency. If you define each value as follows,

--k Number of classes --m Class value -- f frequency

The following formula can be used to calculate the average value of the frequency distribution table.

\frac{\sum_{i=1}^{k}m_if_i}{\sum_{i=1}^{k}f_i} = \frac{m_1f_1 + m_2f_2 +・ ・ ・+ m_kf_k}{f_1 + f_2 +・ ・ ・f_k}

weighted average

This is a method to calculate the average value by adding the weight of the data. For example, if you want to calculate the average from the average value of 100 data and the average value of 10 data, simply using the arithmetic mean will not give the correct average value. In such a case, it is necessary to calculate the average value in consideration of the number of data.

If you define it as follows,

--x data -- w Data weight

It can be calculated by the following formula.

\frac{\sum_{i=1}^{n} x_iw_i}{\sum_{i=1}^{n} w_i} = \frac{x_1w_1 + x_2w_2 +・ ・ ・+ x_nw_n}{w_1 + w_2 +・ ・ ・+ w_n}

example

As a result of one test, the average score of Group A was 60 points, and the average score of Group B was 50 points. Assuming that the number of people in Group A is 20 and the number of people in Group B is 40, the average value of Group A and Group B combined can be calculated as follows.

53.3 \simeq \frac{60\times20 + 50\times40}{20 + 40}

Therefore, the average score of Group A and Group B combined is 53.3 points (rounded down to the second decimal place).

Geometric mean (geometric mean)

The geometric mean is the average of the products multiplied. It is used to calculate the average of growth rate and interest rate. Also, the geometric mean can only handle positive numbers.

If you define it as follows,

--ʻA Data -- n` Number of data

The geometric mean can be calculated by the following formula.

m_g = \sqrt[n]{a_1a_2a_3 ・ ・ ・ a_n}

example

Suppose a company's sales increased by 3% in the first year, 5% in the second year, and 10% in the third year. At that time, let's calculate the average annual sales growth rate of this company.

Year Sales ratio to the previous year
2013 -
2014 3%
2015 5%
2016 10%

The ratio of 3% to the previous year in 2014 means that sales were 103% compared to the previous year. Since it will be 105% in 2015 and 110% in 2016, the following holds.

--ʻA data (1.03, 1.05, 1.1) -- n` Number of data (3)

Apply these two to the formula above.

1.06 \simeq \sqrt[3]{1.03\times1.05\times1.1}

When this is calculated, it becomes 1.059594599927647 ・ ・ ・, so the solution is about 1.06, In other words, the average annual sales growth rate is about 6%.

Harmonic mean

This is used when you want to find the average speed for the entire outbound and inbound journey.

The formula for the harmonic mean is:

m_H = \frac{1}{\frac{1}{n}(\frac{1}{x_1} + \frac{1}{x_2} + \frac{1}{x_3} +・ ・ ・+ \frac{1}{x_n})} = \frac{n}{(\frac{1}{x_1} + \frac{1}{x_2} + \frac{1}{x_3} +・ ・ ・+ \frac{1}{x_n})}

example

Suppose you make a round trip over a distance of 10km under the following conditions.

Round trip Speed time
Outbound 40km 15 minutes
Return trip 4km 150 minutes

Solve without using a formula

This problem can be solved without using a formula.

This is the same as traveling a distance of 20km in 165 minutes. In other words, if x is the velocity, then:

x \times \frac{165}{60} = 20

Solving this, x is about 7.3km / h.

Solve using the formula

Now let's solve it using the formula.

If you define it as follows,

--x data (movement speed) -- n Number of data

The harmonic mean can be calculated by the following formula.

x = \frac{2}{\frac{1}{40} + \frac{1}{4}}

Solving this, x is about 7.3km / h, which is the same as the solution solved without the formula. If you use the formula, you can get the average speed without knowing the distance traveled.

that's all

reference

-I see, Statistics Academy High School -Types of average values and how to use them properly -Important official summary of harmonic mean

Recommended Posts

[Statistics for programmers] There is more than one way to calculate the average value
[Statistics for programmers] What is an event?
The fastest way for beginners to master Python
[Python] Calculate the average value of the pixel value RGB of the object
How to turn the for statement when there are multiple values for one key in the dictionary
Use the pushd command, which is more convenient than the cd command, to instantly return to the original directory.