――What made you decide to write this article --Execution environment --Average --Range --Median 1 --Median 2 (only in ascending and descending order)

standard deviation --Distributed --Cumulative relative frequency --Coefficient of variation ―― 2 Variant analysis --Covariance --Correlation coefficient --Bonus

What made me decide to write this article

When I was reading a statistics book at the library, I thought, "I think I can use Python to find (standard deviation, etc.)." I also wanted to improve my Python skills, so I wrote this article for a change of pace.

Execution environment

macOS Big Sur 11.1 I'm running on paiza.IO (Python3).

average

program

def mean(l):
    return sum(l)/len(l)

Example of use

print(mean([3,4,5,7,6]))
#5.0
print(mean([3,4,11,7,6]))
#6.2

important point

--The return value is a float type --Points that can be used for lists that are not in ascending order ――When the number becomes extremely large, it is omitted with "e" as shown below.

print(mean([3,4,1111111111111111111111111111111,7,6]))
#2.222222222222222e+29

range

program

def Range(l):
    return max(l)-min(l)

It does not have to be in ascending order.

important point

--The first letter of the function name is capitalized so that it does not overlap with the built-in variable range.

Median 1

program

def chuuouti(l):
    l2 = sorted(l)
    if len(l2)%2:
        return l2[int(len(l)/2)]
    else:
        return (l2[int(len(l)/2)-1]+l2[int(len(l)/2)])/2

Example of use

print(chuuouti([1,4,2,3]))

important point

--When the number of elements is even, the average of the two elements in the middle is taken. --It does not have to be in ascending order (however, it takes some time to process sorted, so it is better to use "median 2" in ascending order.

example

4th Algorithm Practical Test A-Median

Median 2 (only in ascending and descending order)

program

def chuuouti(l):
    if len(l)%2:
        return l[int(len(l)/2)]
    else:
        return (l[int(len(l)/2)-1]+l[int(len(l)/2)])/2

important point

--Can only be used in ascending or descending order --When the length is even, the return value is a float type, but when the length is odd, the return type matches the median. (under)

#When the length is even
print(chuuouti([3,2,2,1]))
#2.0
print(chuuouti([4,3,2,1]))
#2.5

#When the length is odd
print(chuuouti([1,3,7]))
#3
print(chuuouti([1,3.2,7]))
#3.2

standard deviation

program

from math import sqrt
def deviation_value(l):
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))#Deviation value

Example of use

print(deviation_value([10,15,5,-22,-18]))
#15.086417732516887

In the above usage example, the average is {10 + 15 + 5 + (-22) + (-18)}/5 to -2. Therefore, the deviation square is 144,289,49,400,256. This sum is 1138. Divide this by the number of elements (5) and take the square root to get the standard deviation.

Cumulative relative frequency

program

def cumulative_relative_frequency(x):
    wa = sum(x)#The sum of the whole
    l = [0]
    y = [x[i]/wa for i in range(len(x))]
    #Cumulative sum from here
    for i in range(len(y)):
        l.append(l[-1]+y[i])
    del l[0]
    return l

Example of use

print(cumulative_relative_frequency([1,3,7,6,9]))
#[0.038461538461538464, 0.15384615384615385, 0.4230769230769231, 0.6538461538461539, 1.0]

Coefficient of variation

program

from math import sqrt
def coefficient_of_variation(l):
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))/mean

Example of use

print(coefficient_of_variation([4,5,3,5]))
#https://bellcurve.jp/statistics/course/19515.html example.

Actually it is 0.2, but since it is output as 0.195, it seems that there is a slight error. This error appears to have occurred at the square root (sqrt).

Bivariate analysis

Covariance

program

def bivariate_analysis(x,y):
    mean_x = sum(x)/len(x)
    mean_y = sum(y)/len(y)
    deviation_x = [x[i]-mean_x for i in range(len(x))]
    deviation_y = [y[i]-mean_y for i in range(len(y))]
    product_of_deviations = [deviation_x[i]*deviation_y[i] for i in range(len(x))]
    return sum(product_of_deviations)/len(x)

Example of use

print(bivariate_analysis([4,7],[3,6]))

By using this function, the covariance of the two lists can be obtained.

merit

--The exact value is required.

Correlation coefficient

program

def correlation_coefficient(x,y):
    return bivariate_analysis(x,y)/deviation_value(x)/deviation_value(y)

Program with standard deviation and covariance

from math import sqrt

def deviation_value(l):#standard deviation
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))#standard deviation
    
    
def bivariate_analysis(x,y):#Covariance
    mean_x = sum(x)/len(x)
    mean_y = sum(y)/len(y)
    deviation_x = [x[i]-mean_x for i in range(len(x))]
    deviation_y = [y[i]-mean_y for i in range(len(y))]
    product_of_deviations = [deviation_x[i]*deviation_y[i] for i in range(len(x))]
    return sum(product_of_deviations)/len(x)
    
def correlation_coefficient(x,y):
    return bivariate_analysis(x,y)/deviation_value(x)/deviation_value(y)

important point

--I am using a program with standard deviation and covariance.

Bonus edition

There is a class of n people. Find the probability that there will be students with the same birthday.

I would like to implement it using Probability of having a duo with the same birthday. From Probability of having a duo with the same birthday, the probability to be calculated is

1−_{365}P_n\div365^n

become.

program

from math import factorial
def permutations_count(n, r):#Excerpt from the * link below
    return factorial(n) // factorial(n - r)
def birthday(n):#Probability of having people with the same birthday in a class of n people
    return 1-permutations_count(365,n)/365**n

The code of P (n, r) is extracted from Calculate and generate factorial, permutation / combination with Python.

Let's find out how many times n is more than 75% for the first time using this program. Also, let's find the probability at that time.

program

people = 0
while True:
    if birthday(people) >= 0.75:
        print(people)
        print(birthday(people))
        break
    people += 1

people is the number of people in the class. If you know that birthday (100) is over 75%, a for statement may be fine. It seems that 23 people are already 50.7%.

References

--Translation -goo dictionary -Google Translate --How to write -Implemented in python after understanding the phenomenon that various probability distributions occur --Program - [ The Programing Guide] (https://strawberryprogrami.wixsite.com/programing) --Meaning of terms -Ready-to-use statistics by Tamio Kan and Yuko Hijikata, 1st edition, 1st edition Publisher: SoftBank Creative -Meaning of covariance and simple method -5-3. Let's find the coefficient of variation -Mathematical symbols (equal sign, inequality sign, operator, set) --Illustrated trivia statistics by Norio Konno, Natsumesha

I made a familiar function that can be used in statistics with Python

table of contents

What made me decide to write this article

Execution environment

average

program

Example of use

important point

range

program

important point

Median 1

program

Example of use

important point

example

Median 2 (only in ascending and descending order)

program

important point

standard deviation

program

Example of use

Cumulative relative frequency

program

Example of use

Coefficient of variation

program

Example of use

Bivariate analysis

Covariance

program

Example of use

merit

Correlation coefficient

program

Program with standard deviation and covariance

important point

Bonus edition

program

program

References