Whenever I try to generate random numbers, I can't remember which function to use, so I've put together a memorandum of random number generation functions that are likely to be used frequently.
In particular, the random numbers generated from the various probability distributions in the latter half are shown with graphs and images, so I think it will be useful for understanding the probability distribution itself. In particular, I was worried about the chi-square distribution because I didn't know the image in the past, so I tried to explain it intuitively.
Below, it is described on the assumption that the following libraries are imported.
import numpy as np
import numpy.random as rd
import scipy.stats as st
import matplotlib.pyplot as plt
rand(d0, d1, ..., dn)###
x = rd.rand(2, 3)
print x
result
[[ 0.49748253 0.88897543 0.65014384]
[ 0.68424239 0.19667014 0.83407881]]
Generate a uniform distribution of [0, 1). The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 3 columns. If there is no argument, one random number is generated.
randn(d0, d1, ..., dn) ###
x1 = rd.randn(2, 4)
print x1
x2 = 2.5 * rd.randn(3, 3) + 3
print x2
result
[[-0.42016216 0.41704326 -0.93713613 0.23174941]
[-0.95513093 1.00766086 -0.5724616 1.32460314]]
[[-1.51762436 4.88306835 3.21346622]
[ 0.93229257 4.0592773 4.99599127]
[ 3.77544739 -0.20112058 2.47063097]]
Generates random numbers generated from a normal distribution with mean 0 and standard deviation 1. The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 4 columns. If there is no argument, one random number is generated.
If you want to specify the mean and standard deviation, write as sigma * rd.randn () + mu
.
randint(low, high=None, size=None) ###
x = rd.randint(low=0, high=5, size=10)
print x
li = np.array([u"Math", u"Science", u"society", u"National language", u"English"])
for l in li[x]:
print l
result
[2 0 1 0 0 0 1 3 1 4]
society
Math
Science
Math
Math
Math
Science
National language
Science
English
Generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. Note that if high is not omitted, the range [0, low) is set, and if high is described, the range [low, high) is set, and both do not include the upper limit value.
This is useful when you want to randomly extract some elements from a certain array.
random_integers(low, high=None, size=None) ###
x = rd.random_integers(low=1, high=10, size=(2,5))
print x
dice = rd.random_integers(1, 6, 100) #Simulation of rolling the dice 100 times
print dice
result
[[10 5 7 7 8]
[ 3 5 6 9 6]]
[4 5 2 2 1 1 6 4 5 5 5 5 1 5 1 1 3 2 4 4 5 3 6 6 3 3 5 3 6 1 1 4 1 1 2 1 1
5 1 6 6 6 6 2 6 3 4 5 1 6 3 1 2 6 1 5 2 3 4 4 3 1 2 1 1 3 5 2 2 1 4 1 6 6
2 5 4 3 2 1 4 1 2 4 2 5 3 3 1 4 4 1 6 4 1 1 3 6 1 6]
Like randint (), it generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. The main difference is in the range, if high is not omitted, the range is set to [1, low], if high is described, the range is set to [low, high], and only low is specified as "including the upper limit". Where the adjustment of the case is "1".
random_sample(size=None), random(size=None), ranf(size=None), sample(size=None) ###
x = np.random.random_sample((4,3))
print x
result
[[ 0.613437 0.38902499 0.91052787]
[ 0.80291265 0.81324739 0.06631052]
[ 0.62305967 0.44327718 0.2650803 ]
[ 0.76565352 0.42962876 0.40136025]]
As the title suggests, there are four types, http://stackoverflow.com/questions/18829185/difference-between-various-numpy-random-functions According to, they are all the same (other than random_sample are aliases). What the hell (laughs) The difference from rand () is that the way to specify the arguments is that these are specified by tuples, but rand () is the way to specify that there are multiple arguments themselves.
choice(a, size=None, replace=True, p=None) ###
x1=rd.choice(5, 5, replace=False ) # 0-Equivalent to sorting 4
print x1
x2=rd.choice(5, 5, p=[0.1,0.1,0.1,0.1,0.6]) #High probability of getting 4
print x2
result
[1 4 2 3 0]
[4 4 4 4 2]
The structure of the argument of choice is choice (a, size = None, replace = True, p = None). a represents a random number selection from range (a). Generates the number of random numbers specified by size. Although replace is characteristic, it is considered to be sampled from range (a), but if True is specified, a random number will be generated without returning the extracted number. The same thing is not twice. Therefore, if the value of a is smaller than size, an error will occur. Although p is also characteristic, you can specify the probability of occurrence of each number instead of a uniform random number. Therefore, if the sizes of a and p are not the same, an error will occur.
Other random numbers so far are returned by python standard list, but this is returned by numpy's ndarray.
shuffle(x) ###
x = range(10)
rd.shuffle(x)
print x
result
[3, 4, 2, 5, 8, 9, 6, 1, 7, 0]
A function that randomly shuffles the order of arrays. Note that it modifies the array itself given as an argument, rather than returning it as a return value.
permutation(x) ###
x1 = rd.permutation(10)
print x1
li = ['cat', 'dog', 'tiger', 'lion', 'elephant']
x2 = rd.permutation(li)
print x2
result
[4 0 6 5 3 8 7 1 9 2]
['elephant' 'tiger' 'lion' 'dog' 'cat']
If an int type variable is specified as an argument, range (a) is generated internally and it is sorted randomly. If list is specified as an argument, the elements will be sorted randomly. The value in list is not a numerical value but can be a list such as a character string.
uniform(low=0.0, high=1.0, size=None) ###
x = rd.uniform(-2,5,30)
print x
result
[-1.79969471 0.6422639 4.36130597 -1.99694629 3.23979431 4.75933857
1.39738979 0.12817182 1.64040588 3.0256498 0.14997201 2.0023698
3.76051422 -1.80957115 -0.2320044 -1.82575799 1.26600285 -0.27668411
0.77422678 0.71193145 -1.42972204 4.62962696 -1.90378575 1.84045518
1.06136363 4.83948262 3.57364714 1.73556559 -0.97367223 3.84649039]
Generates random numbers generated from a uniform distribution. The difference from the uniform distribution system random number generation function explained so far is that the range can be specified. The argument structure is (low = 0.0, high = 1.0, size = None), and a half-open interval with an empty top such as [low, high).
x = rd.binomial(10, 0.5, 20)
print x
result
[5 4 5 5 4 3 8 3 6 6 3 4 5 1 5 7 6 4 2 6]
Generate a random number generated from the binomial distribution when the success probability p is tried n times. The following can be thought of as a histogram when a probability of 0.5 coin toss is performed 30 times and the number of times is noted, which is performed 3000 times.
x = rd.binomial(30, 0.5, 3000)
plt.hist(x, 17)
x = rd.poisson(30, 20)
print x
result
[25 31 38 20 36 29 28 31 22 31 27 24 24 26 32 42 27 20 30 31]
Random numbers are generated from the Poisson distribution, which occurs lam times per unit time. Taking the click rate of a certain ad as an example, it is applied to the case where the ad is clicked 30 times an hour.
The following can be considered as a histogram when an average of 5 clicks per hour is tried 1000 times (= 1000 hours of data is taken).
x = rd.poisson(5, 1000)
plt.hist(x, 14)
ngood, nbad, nsamp = 90, 10, 10
x = rd.hypergeometric(ngood, nbad, nsamp, 100)
print x
print np.average(x)
result
[ 9 10 8 9 8 7 7 9 10 7 10 9 9 8 9 9 9 9 8 10 5 10 9 9 9
9 9 10 10 8 10 9 9 9 7 9 9 10 10 7 9 9 10 10 8 9 10 10 8 10
10 9 9 10 9 10 8 9 9 9 8 9 10 9 10 10 10 9 9 9 10 9 8 10 7
7 10 10 9 10 10 9 10 9 7 9 9 8 8 10 7 8 9 10 9 9 10 9 8 10]
8.97
Generate a random number generated from a hypergeometric distribution. For example, there are ngood good products and nbad defective products, and the number of good products that can be taken out when nsamp pieces are extracted by the defect rate survey is returned.
The graph below shows the number of non-defective products obtained by sampling 20 products when there are 190 non-defective products and 10 defective products (that is, a defect rate of 5%) in a collection box containing 200 products. It can be thought of as a histogram of the data when this is done for 3000 assembly boxes (which contain exactly the same number of good and defective products).
ngood, nbad, nsamp = 190, 10, 20
x = rd.hypergeometric(ngood, nbad, nsamp, 3000)
plt.hist(x, 6)
x = rd.geometric(p=0.01, size=100)
print x
result
[294 36 25 18 171 24 145 280 132 15 65 88 180 103 34 105 3 34
111 143 5 26 204 27 1 24 442 213 25 93 97 28 80 93 6 189
90 31 213 13 124 50 110 47 45 66 21 1 88 79 332 80 32 19
17 2 38 62 121 136 175 81 115 82 35 136 49 810 302 31 147 207
80 125 33 53 32 98 189 4 766 72 68 10 23 233 14 21 61 362
179 56 13 55 2 48 41 54 39 279]
Generate a random number generated from the geometric distribution. It returns a random number of the number of times that success will occur when the trial with success probability p is repeated until it succeeds.
The graph below repeats the trial with a probability of 1% until it succeeds, and notes the number of times until success. It is considered that the data when it was repeated 1000 times was made into a histogram.
x = rd.geometric(p=0.01, size=1000)
plt.hist(x, 30)
x = np.random.normal(5, 2, 20)
print x
result
[-0.28713217 2.07791879 2.48991635 5.36918301 4.32797397 1.40568929
6.36821312 3.22562844 4.16203214 3.91913171 6.26830012 4.74572788
4.78666884 6.76617469 5.05386902 3.20053316 9.04530241 5.71373444
5.95406987 2.61879994]
It generates random numbers generated from the normal distribution, which is the royal road of the probability distribution. loc is the mean and scale is the standard deviation. Below is the histogram.
x = np.random.normal(5, 2, 10000)
plt.hist(x,20)
By the way, the random numbers generated from the chi-square distribution introduced below can be created from the combination of random numbers generated from this normal distribution and the square.
#Average 0,Generate a random number generated from a normal distribution with a standard deviation of 1 and square its value
x1 = np.random.normal(0, 1, 10000)**2
x2 = np.random.normal(0, 1, 10000)**2
x3 = np.random.normal(0, 1, 10000)**2
x4 = np.random.normal(0, 1, 10000)**2
x5 = np.random.normal(0, 1, 10000)**2
x6 = np.random.normal(0, 1, 10000)**2
#Adding two random numbers generated from the squared normal distribution gives a chi-square distribution with one degree of freedom (blue graph).
plt.hist(x1+x2,20, color='b')
plt.show()
#Chi-square distribution with 2 degrees of freedom when 3 are added (green graph)
plt.hist(x1+x2+x3,20, color='g')
plt.show()
#Add 6 more to chi-square distribution with 5 degrees of freedom (red graph)
plt.hist(x1+x2+x3+x4+x5+x6,20, color='r')
plt.show()
x = rd.chisquare(3, 20)
print x
result
[ 0.69372667 0.94576453 3.7221214 6.25174061 3.07001732 1.14520278
0.92011307 0.46210561 4.16801678 5.89167331 2.57532324 2.07169671
3.91118545 3.12737954 1.02127029 0.69982098 1.27009033 2.25570581
4.66501179 2.06312544]
Returns a random number generated from a chi-square distribution with df degrees of freedom. As mentioned in the section above, the chi-square distribution is a distribution that follows the square of the random numbers generated from the standard normal distribution and the sum of them.
#2 degrees of freedom, 5,Histogram of random numbers generated by a chi-square distribution according to 20
for df, c in zip([2,5,20], "bgr"):
x = rd.chisquare(df, 1000)
plt.hist(x, 20, color=c)
plt.show()
x = rd.f(6, 28, 30)
print x
result
[ 0.54770358 0.90513244 1.32533065 0.75125196 1.000936 1.00622822
1.18431869 0.73399399 0.6237275 1.51806607 1.12040041 1.67777055
0.40309609 0.29640278 0.49408306 1.97680072 0.51474868 0.28782202
0.90206995 0.30968917 1.29931934 1.19406178 1.28635087 2.73510067
0.41310779 1.36155992 0.2887777 0.78830371 0.25557871 0.96761269]
Returns a random number generated from an F distribution with two degrees of freedom dfnum and dfden. This F distribution is a probability distribution consisting of random variables that follow two independent chi-square distributions in the molecule and denominator (each divided by the degree of freedom). The chi-square distribution can be regarded as a variance in the sense that it is normalized and squared, so it is used to test that the two variances are the same.
The graph below is a histogram of random numbers generated from an F distribution with degrees of freedom (1,4), (5,7), (10,10), and (40,50), respectively.
for df, c in zip([(1,4), (5,7), (10,10), (40,50)], "bgry"):
x = rd.f(df[0], df[1], 1000)
plt.hist(x, 100, color=c)
plt.show()
lam = 0.1 #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=20)
print x
result
[ 11.2642272 41.01507264 11.5756986 27.10318556 10.7079342
0.17961819 24.49974467 6.46388826 9.69390641 2.85354527
0.55508868 4.04772073 24.60029857 23.10866 19.83649067
12.12219301 10.24395203 0.16056754 8.9401544 8.86083473]
Returns a random number generated from an exponential distribution with the parameter lam. lam is a parameter that indicates the average number of times that occurs in a unit time. When setting to exponential, set the reciprocal of lam to scale. exponential returns a random number indicating how many unit times it took for the next occurrence of an event that occurs an average of lam times in a unit time.
In other words, if there is an event that occurs 0.1 times on average in 1 minute, and if it is 3, it means that it happened 3 minutes later. The graph below is a histogram of the random numbers generated from the exponential distribution when lam = 0.1.
lam = 0.1 #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=10000)
plt.hist(x, 100)
numpy reference site http://docs.scipy.org/doc/numpy/reference/routines.random.html
Recommended Posts