Weighting of random.choice even under numpy v1.6

Numpy on GAE / py

In Google Cloud Platform (GCP) App Engine standard environment with python There are various restrictions, but you can use numpy: + 1:

Built-in third party library

However, the version is fixed at 1.6.1: scream: (Is the latest version 1.13.1 as of August 31, 2017?)

Therefore, various functions may not be available, but this time I was in trouble because I could not use np.random.choice.

numpy.random.choice

This is a convenient function of random extraction added in 1.7 and above. numpy.random.choice

Basic function

Basically, it is a method that receives an array or array length and returns an array element or index by random number extraction. Specify the number to be randomly extracted in the second argument (or size). The size is 1 when not specified, and the return value is an element instead of an array.

#Returns index with array length
>>> np.random.choice(5)
3
#If you specify the size, it will return the index as an array
>>> np.random.choice(5, 3)
array([3, 2, 4])

#Specify an array and return an element
>>> np.random.choice(['alpha', 'beta', 'gamma'])
'alpha'
#If you specify the size, it will retrieve the array element and return it.
>>> np.random.choice(['alpha', 'beta', 'gamma'], 2)
array(['alpha', 'beta'],
      dtype='|S5')

In fact, size and arrays also support multiple dimensions. It seems to be troublesome to bring this to 1.6 as well. ..

However, if it is only a standard function, it seems that it can be reproduced by making full use of np.random.randint. Since randint can specify the range and size of the array, it can be said that the index selection part of choice is common.

Optional feature: Duplicate control

Randomly extracted content with a specified size is allowed to be duplicated by default.

#The default is duplicate permission, so you may choose the same value.
>>> np.random.choice(5, 3)
array([0, 0, 0])
>>> np.random.choice(['alpha', 'beta', 'gamma'], 2)
array(['gamma', 'gamma'],
      dtype='|S5')

# replace=You can prevent duplication with False.
>>> np.random.choice(5, 3, replace=False)
array([0, 1, 4])
>>> np.random.choice(['alpha', 'beta', 'gamma'], 2, replace=False)
array(['gamma', 'alpha'],
      dtype='|S5')

An error will occur if the size is larger than the array length to be selected.

This time, it seemed that this duplication prohibition was not implemented if it was np.random of 1.6 or less, so I was in trouble. Moreover, if you loop the randint so that it does not overlap, it will take a long time if you are unlucky. (I don't know the pseudo-random numbers, so maybe that's not the case ...)

So this time I will use sample of python's random module. random.sample can be extracted from the array with no duplication and size specification. I don't think it's exactly the same because the random number generation method may be different, but ...

Optional features: Weighting

You can weight by passing an array named p that is the same length as the extraction target. p is an abbreviation for probability, and it seems that the total should be 1. By default it is evenly distributed.

#You can add weight by weighting.
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 2, 3])
>>> np.random.choice(['alpha', 'beta', 'gamma'], 2, p=[0.1, 0.1, 0.8])
array(['gamma', 'gamma'],
      dtype='|S5')

#Not covered in this article, but can be used with replace
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0], replace=False)
array([2, 3, 0])
>>> np.random.choice(['alpha', 'beta', 'gamma'], 2, p=[0.1, 0.1, 0.8], replace=False)
array(['gamma', 'alpha'],
      dtype='|S5')

Someone was asking this weighting on stackoverflow, so I borrowed the answer to that thread.

How do I “randomly” select numbers with a specified bias toward a particular number

It's not well understood, but it seems that p is converted to a cumulative array and uniformly distributed random numbers are placed there to make the actual random number values.

Implementation example

It's a rather dull conditional branch, but I think it will look like this when implemented. Since python's random is used for extraction without duplication, there may be performance degradation and random number bias. Please let me know if there is a better implementation. (Actually, there seems to be a considerable api even in 1.6 series)

The best thing is to use numpy 1.7 or above.

def numpy_choice(a, size=1, replace=True, p=None):
    # 1.If it is 6, there is no choice, so reproduce it.replace allows duplicate.p is weighted
    #If it is an integer, it will be an array, otherwise it will be an array..
    values = np.arange(a) if isinstance(a, int) else np.asarray(a)
    if p:
        # TODO:We need to verify the length of p and the length of a.
        #Also, it is not supported in combination with replace now. ..(I happened to not need it)
        choiced = weighted_choice(values, p, size)
    else:
        length = len(values)
        if replace or size > length:
            #If there is duplication, use randint
            idx = np.random.randint(0, length, size)
        else:
            #No duplicate is python random.Use sample
            idx = random.sample(np.arange(length), size)
        choiced = values[idx]
    if size == 1 and len(choiced) == 1:
        #When size 1, the element is returned
        return choiced[0]
    return choiced


def weighted_choice(values, p, size=1):
    #Weighted choice.It remains stackoverflow.
    values = np.asarray(values)

    cdf = np.cumsum(np.asarray(p))
    cdf /= cdf[-1]

    uniform_samples = np.random.sample(size)
    idx = cdf.searchsorted(uniform_samples, side='right')
    sample = values[idx]

    return sample

Task

--It is not good to use random and np.random together (maybe only random is better) --p cannot be verified --P and replace cannot be implemented together