Whenever I try to generate random numbers, I can't remember which function to use, so I've put together a memorandum of random number generation functions that are likely to be used frequently.

In particular, the random numbers generated from the various probability distributions in the latter half are shown with graphs and images, so I think it will be useful for understanding the probability distribution itself. In particular, I was worried about the chi-square distribution because I didn't know the image in the past, so I tried to explain it intuitively.

Below, it is described on the assumption that the following libraries are imported.

import numpy as np
import numpy.random as rd
import scipy.stats as st
import matplotlib.pyplot as plt

Uniformly distributed random number

rand(d0, d1, ..., dn)###

x = rd.rand(2, 3)
print x

`result`


[[ 0.49748253  0.88897543  0.65014384]
 [ 0.68424239  0.19667014  0.83407881]]

Generate a uniform distribution of [0, 1). The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 3 columns. If there is no argument, one random number is generated.

randn(d0, d1, ..., dn) ###

x1 = rd.randn(2, 4)
print x1

x2 = 2.5 * rd.randn(3, 3) + 3
print x2

`result`


[[-0.42016216  0.41704326 -0.93713613  0.23174941]
 [-0.95513093  1.00766086 -0.5724616   1.32460314]]

[[-1.51762436  4.88306835  3.21346622]
 [ 0.93229257  4.0592773   4.99599127]
 [ 3.77544739 -0.20112058  2.47063097]]

Generates random numbers generated from a normal distribution with mean 0 and standard deviation 1. The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 4 columns. If there is no argument, one random number is generated. If you want to specify the mean and standard deviation, write as sigma * rd.randn () + mu.

randint(low, high=None, size=None) ###

x = rd.randint(low=0, high=5, size=10)
print x
li = np.array([u"Math", u"Science", u"society", u"National language", u"English"])

for l in li[x]:
    print l

`result`


[2 0 1 0 0 0 1 3 1 4]
society
Math
Science
Math
Math
Math
Science
National language
Science
English

Generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. Note that if high is not omitted, the range [0, low) is set, and if high is described, the range [low, high) is set, and both do not include the upper limit value.

This is useful when you want to randomly extract some elements from a certain array.

random_integers(low, high=None, size=None) ###

x = rd.random_integers(low=1, high=10, size=(2,5))
print x

dice = rd.random_integers(1, 6, 100) #Simulation of rolling the dice 100 times
print dice

`result`


[[10  5  7  7  8]
 [ 3  5  6  9  6]]

[4 5 2 2 1 1 6 4 5 5 5 5 1 5 1 1 3 2 4 4 5 3 6 6 3 3 5 3 6 1 1 4 1 1 2 1 1
 5 1 6 6 6 6 2 6 3 4 5 1 6 3 1 2 6 1 5 2 3 4 4 3 1 2 1 1 3 5 2 2 1 4 1 6 6
 2 5 4 3 2 1 4 1 2 4 2 5 3 3 1 4 4 1 6 4 1 1 3 6 1 6]

Like randint (), it generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. The main difference is in the range, if high is not omitted, the range is set to [1, low], if high is described, the range is set to [low, high], and only low is specified as "including the upper limit". Where the adjustment of the case is "1".

random_sample(size=None), random(size=None), ranf(size=None), sample(size=None) ###

x = np.random.random_sample((4,3))
print x

`result`


[[ 0.613437    0.38902499  0.91052787]
 [ 0.80291265  0.81324739  0.06631052]
 [ 0.62305967  0.44327718  0.2650803 ]
 [ 0.76565352  0.42962876  0.40136025]]

As the title suggests, there are four types, http://stackoverflow.com/questions/18829185/difference-between-various-numpy-random-functions According to, they are all the same (other than random_sample are aliases). What the hell (laughs) The difference from rand () is that the way to specify the arguments is that these are specified by tuples, but rand () is the way to specify that there are multiple arguments themselves.

choice(a, size=None, replace=True, p=None) ###

x1=rd.choice(5, 5, replace=False )     # 0-Equivalent to sorting 4
print x1

x2=rd.choice(5, 5, p=[0.1,0.1,0.1,0.1,0.6]) #High probability of getting 4
print x2

`result`


[1 4 2 3 0]
[4 4 4 4 2]

The structure of the argument of choice is choice (a, size = None, replace = True, p = None). a represents a random number selection from range (a). Generates the number of random numbers specified by size. Although replace is characteristic, it is considered to be sampled from range (a), but if True is specified, a random number will be generated without returning the extracted number. The same thing is not twice. Therefore, if the value of a is smaller than size, an error will occur. Although p is also characteristic, you can specify the probability of occurrence of each number instead of a uniform random number. Therefore, if the sizes of a and p are not the same, an error will occur.

Other random numbers so far are returned by python standard list, but this is returned by numpy's ndarray.

shuffle(x) ###

x = range(10)
rd.shuffle(x)
print x

`result`


[3, 4, 2, 5, 8, 9, 6, 1, 7, 0]

A function that randomly shuffles the order of arrays. Note that it modifies the array itself given as an argument, rather than returning it as a return value.

permutation(x) ###

x1 = rd.permutation(10)
print x1

li = ['cat', 'dog', 'tiger', 'lion', 'elephant']
x2 = rd.permutation(li)
print x2

`result`


[4 0 6 5 3 8 7 1 9 2]

['elephant' 'tiger' 'lion' 'dog' 'cat']

If an int type variable is specified as an argument, range (a) is generated internally and it is sorted randomly. If list is specified as an argument, the elements will be sorted randomly. The value in list is not a numerical value but can be a list such as a character string.

uniform(low=0.0, high=1.0, size=None) ###

x = rd.uniform(-2,5,30)
print x

`result`


[-1.79969471  0.6422639   4.36130597 -1.99694629  3.23979431  4.75933857
  1.39738979  0.12817182  1.64040588  3.0256498   0.14997201  2.0023698
  3.76051422 -1.80957115 -0.2320044  -1.82575799  1.26600285 -0.27668411
  0.77422678  0.71193145 -1.42972204  4.62962696 -1.90378575  1.84045518
  1.06136363  4.83948262  3.57364714  1.73556559 -0.97367223  3.84649039]

Generates random numbers generated from a uniform distribution. The difference from the uniform distribution system random number generation function explained so far is that the range can be specified. The argument structure is (low = 0.0, high = 1.0, size = None), and a half-open interval with an empty top such as [low, high).

Probability distribution model random number

binomial (n, p, size = None): Binomial distribution

x = rd.binomial(10, 0.5, 20)
print x

`result`


[5 4 5 5 4 3 8 3 6 6 3 4 5 1 5 7 6 4 2 6]

Generate a random number generated from the binomial distribution when the success probability p is tried n times. The following can be thought of as a histogram when a probability of 0.5 coin toss is performed 30 times and the number of times is noted, which is performed 3000 times.

x = rd.binomial(30, 0.5, 3000)
plt.hist(x, 17)

poisson (lam = 1.0, size = None): Poisson distribution

x = rd.poisson(30, 20)
print x

`result`


[25 31 38 20 36 29 28 31 22 31 27 24 24 26 32 42 27 20 30 31]

Random numbers are generated from the Poisson distribution, which occurs lam times per unit time. Taking the click rate of a certain ad as an example, it is applied to the case where the ad is clicked 30 times an hour.

The following can be considered as a histogram when an average of 5 clicks per hour is tried 1000 times (= 1000 hours of data is taken).

x = rd.poisson(5, 1000)
plt.hist(x, 14)

hypergeometric (ngood, nbad, nsample, size = None): Hypergeometric distribution

ngood, nbad, nsamp = 90, 10, 10
x = rd.hypergeometric(ngood, nbad, nsamp, 100)
print x
print np.average(x)

`result`


[ 9 10  8  9  8  7  7  9 10  7 10  9  9  8  9  9  9  9  8 10  5 10  9  9  9
  9  9 10 10  8 10  9  9  9  7  9  9 10 10  7  9  9 10 10  8  9 10 10  8 10
 10  9  9 10  9 10  8  9  9  9  8  9 10  9 10 10 10  9  9  9 10  9  8 10  7
  7 10 10  9 10 10  9 10  9  7  9  9  8  8 10  7  8  9 10  9  9 10  9  8 10]
8.97

Generate a random number generated from a hypergeometric distribution. For example, there are ngood good products and nbad defective products, and the number of good products that can be taken out when nsamp pieces are extracted by the defect rate survey is returned.

The graph below shows the number of non-defective products obtained by sampling 20 products when there are 190 non-defective products and 10 defective products (that is, a defect rate of 5%) in a collection box containing 200 products. It can be thought of as a histogram of the data when this is done for 3000 assembly boxes (which contain exactly the same number of good and defective products).

ngood, nbad, nsamp = 190, 10, 20
x = rd.hypergeometric(ngood, nbad, nsamp, 3000)
plt.hist(x, 6)

geometric (p, size = None): Geometric distribution

x = rd.geometric(p=0.01, size=100)
print x

`result`


[294  36  25  18 171  24 145 280 132  15  65  88 180 103  34 105   3  34
 111 143   5  26 204  27   1  24 442 213  25  93  97  28  80  93   6 189
  90  31 213  13 124  50 110  47  45  66  21   1  88  79 332  80  32  19
  17   2  38  62 121 136 175  81 115  82  35 136  49 810 302  31 147 207
  80 125  33  53  32  98 189   4 766  72  68  10  23 233  14  21  61 362
 179  56  13  55   2  48  41  54  39 279]

Generate a random number generated from the geometric distribution. It returns a random number of the number of times that success will occur when the trial with success probability p is repeated until it succeeds.

The graph below repeats the trial with a probability of 1% until it succeeds, and notes the number of times until success. It is considered that the data when it was repeated 1000 times was made into a histogram.

x = rd.geometric(p=0.01, size=1000)
plt.hist(x, 30)

normal (loc = 0.0, scale = 1.0, size = None): Normal distribution

x = np.random.normal(5, 2, 20)
print x

`result`


[-0.28713217  2.07791879  2.48991635  5.36918301  4.32797397  1.40568929
  6.36821312  3.22562844  4.16203214  3.91913171  6.26830012  4.74572788
  4.78666884  6.76617469  5.05386902  3.20053316  9.04530241  5.71373444
  5.95406987  2.61879994]

It generates random numbers generated from the normal distribution, which is the royal road of the probability distribution. loc is the mean and scale is the standard deviation. Below is the histogram.

x = np.random.normal(5, 2, 10000)
plt.hist(x,20)

By the way, the random numbers generated from the chi-square distribution introduced below can be created from the combination of random numbers generated from this normal distribution and the square.

#Average 0,Generate a random number generated from a normal distribution with a standard deviation of 1 and square its value
x1 = np.random.normal(0, 1, 10000)**2
x2 = np.random.normal(0, 1, 10000)**2
x3 = np.random.normal(0, 1, 10000)**2
x4 = np.random.normal(0, 1, 10000)**2
x5 = np.random.normal(0, 1, 10000)**2
x6 = np.random.normal(0, 1, 10000)**2

#Adding two random numbers generated from the squared normal distribution gives a chi-square distribution with one degree of freedom (blue graph).
plt.hist(x1+x2,20, color='b')
plt.show()
#Chi-square distribution with 2 degrees of freedom when 3 are added (green graph)
plt.hist(x1+x2+x3,20, color='g')
plt.show()
#Add 6 more to chi-square distribution with 5 degrees of freedom (red graph)
plt.hist(x1+x2+x3+x4+x5+x6,20, color='r')
plt.show()

chisquare (df, size = None): Chi-square distribution

x = rd.chisquare(3, 20)
print x

`result`


[ 0.69372667  0.94576453  3.7221214   6.25174061  3.07001732  1.14520278
  0.92011307  0.46210561  4.16801678  5.89167331  2.57532324  2.07169671
  3.91118545  3.12737954  1.02127029  0.69982098  1.27009033  2.25570581
  4.66501179  2.06312544]

Returns a random number generated from a chi-square distribution with df degrees of freedom. As mentioned in the section above, the chi-square distribution is a distribution that follows the square of the random numbers generated from the standard normal distribution and the sum of them.

#2 degrees of freedom, 5,Histogram of random numbers generated by a chi-square distribution according to 20
for df, c in zip([2,5,20], "bgr"):
    x = rd.chisquare(df, 1000)
    plt.hist(x, 20, color=c)
    plt.show()

f (dfnum, dfden, size = None): F distribution

x = rd.f(6, 28, 30)
print x

`result`


[ 0.54770358  0.90513244  1.32533065  0.75125196  1.000936    1.00622822
  1.18431869  0.73399399  0.6237275   1.51806607  1.12040041  1.67777055
  0.40309609  0.29640278  0.49408306  1.97680072  0.51474868  0.28782202
  0.90206995  0.30968917  1.29931934  1.19406178  1.28635087  2.73510067
  0.41310779  1.36155992  0.2887777   0.78830371  0.25557871  0.96761269]

Returns a random number generated from an F distribution with two degrees of freedom dfnum and dfden. This F distribution is a probability distribution consisting of random variables that follow two independent chi-square distributions in the molecule and denominator (each divided by the degree of freedom). The chi-square distribution can be regarded as a variance in the sense that it is normalized and squared, so it is used to test that the two variances are the same.

The graph below is a histogram of random numbers generated from an F distribution with degrees of freedom (1,4), (5,7), (10,10), and (40,50), respectively.

for df, c in zip([(1,4), (5,7), (10,10), (40,50)], "bgry"):
    x = rd.f(df[0], df[1], 1000)
    plt.hist(x, 100, color=c)
    plt.show()

exponential (scale = 1.0, size = None): Exponential distribution

lam = 0.1   #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=20)
print x

`result`


[ 11.2642272   41.01507264  11.5756986   27.10318556  10.7079342
   0.17961819  24.49974467   6.46388826   9.69390641   2.85354527
   0.55508868   4.04772073  24.60029857  23.10866     19.83649067
  12.12219301  10.24395203   0.16056754   8.9401544    8.86083473]

Returns a random number generated from an exponential distribution with the parameter lam. lam is a parameter that indicates the average number of times that occurs in a unit time. When setting to exponential, set the reciprocal of lam to scale. exponential returns a random number indicating how many unit times it took for the next occurrence of an event that occurs an average of lam times in a unit time.

In other words, if there is an event that occurs 0.1 times on average in 1 minute, and if it is 3, it means that it happened 3 minutes later. The graph below is a histogram of the random numbers generated from the exponential distribution when lam = 0.1.

lam = 0.1  #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=10000)
plt.hist(x, 100)

Referenced site

numpy reference site http://docs.scipy.org/doc/numpy/reference/routines.random.html

[python] Random number generation memorandum

Uniformly distributed random number

result

result

result

result

result

result

result

result

result

Probability distribution model random number

binomial (n, p, size = None): Binomial distribution

result

poisson (lam = 1.0, size = None): Poisson distribution

result

hypergeometric (ngood, nbad, nsample, size = None): Hypergeometric distribution

result

geometric (p, size = None): Geometric distribution

result

normal (loc = 0.0, scale = 1.0, size = None): Normal distribution

result

chisquare (df, size = None): Chi-square distribution

result

f (dfnum, dfden, size = None): F distribution

result

exponential (scale = 1.0, size = None): Exponential distribution

result

Referenced site

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`

`result`