[Statistics for programmers] Random variables, probability distributions, and probability density functions

table of contents

Statistics for Programmers-Table of Contents

What is a random variable?

Random variables are values that you know the range of possible values, but do not know which one. For example, when you roll the dice, the number of rolls you get is 1,2,3,4,5,6, and the rolls 1 to 6 are random variables.

Written in the formula, it is as follows.

P(X) = \frac{1}{6} (X = 1, 2, 3, 4, 5, 6)

In addition, the probability of an event where a dice is thrown and a 5 is rolled can be expressed by the following formula.

P(5) = \frac{1}{6}

Probability distribution

The probability distribution is the distribution of each value of a random variable and the probability that that value will appear. For example, in the case of dice, it will be as follows.

Dice roll probability
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6

Discrete random variable

Discrete random variables are random variables that take only discrete values. For example, the dice roll is 2 after 1, and 1.1 and 1.2 do not exist. Such variables are called discrete random variables. Things like height and weight are continuous random variables.

Discrete probability distribution

The probability distribution of a discrete random variable is called a discrete probability distribution.

Probability mass function

When the probability that a discrete random variable becomes x isf (x), thisf (x)is called a probability mass function. Also, since the probability of all events occurring is 1, the following equation holds.

\Sigma_{i=1}^{n} P(x_i) = P(x_1) + P(x_2) +・ ・ ・+ P(x_i) = 1

An example of a dice would be:

\frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = 1

Continuous probability distribution

A continuous random variable is the opposite of the above-mentioned discrete random variable, and there are innumerable values between adjacent values such as height and weight. For example, in the case of height, there are innumerable values between 180 cm and 181 cm, such as 180.01 cm, 180,001 cm, 180.0001 cm, and so on. The probability distribution of such a continuous random variable is called a continuous probability distribution.

When the range of continuous random variables is 1 to 6, the probability that 3 will appear is not 1/6 as in the case of discrete type. 3 is one of the infinitely existing values, in which case the probability is 0`.

P(x) = \frac{1}{\infty} = 0

Probability density function

In the case of a continuous random variable, the probability that the value of a specific random variable takes is 0, The probability that the continuous variable X is ʻa ≤ X ≤ b` is called the probability density.

When the probability that the continuous random variable X is greater than or equal to ʻa and less than or equal to bis calculated below,f (x)is called the probability density function. The sum of the probabilities of all events is1, so = 1. Conversely, anything that does not become 1` is not a probability density function.

P(a \leq X \leq b) = \int _a ^b f(x) dx = 1

In this formula, the area in the range a to b is calculated as shown in the figure below. The definite integral of f (x) from a to b represents the probability that X is between a and b.

kakuritumitudo.png

Source: Beautiful story of high school mathematics

reference

-Statistics web-Random variables and probability distribution -Meaning and concrete example of probability density function -[Probability Density Function and Normal Distribution](https://logics-of-blue.com/%E7%A2%BA%E7%8E%87%E5%AF%86%E5%BA%A6%E9%96 % A2% E6% 95% B0% E3% 81% A8% E6% AD% A3% E8% A6% 8F% E5% 88% 86% E5% B8% 83 /) -From probability density function to Monte Carlo integration

Recommended Posts

[Statistics for programmers] Random variables, probability distributions, and probability density functions
[Introduction to Data Scientists] Basics of Probability and Statistics ♬ Probability / Random Variables and Probability Distribution
[Statistics for programmers] Conditional probabilities and multiplication theorems
[Statistics for programmers] Lorenz curve and Gini coefficient
[Statistics for programmers] Box plot
[Statistics for programmers] Variance, standard deviation and coefficient of variation
[Statistics for programmers] Mean, median, mode
Tips for replacing and debugging functions
Summary of probability distributions that often appear in statistics and data analysis
[Statistics for programmers] What is an event?
[Statistics for Programmers] Table of Contents-Data Science
Mathematical statistics from the basics Random variables
List of main probability distributions used in machine learning and statistics and code in python