I tried to simulate ad optimization using the bandit algorithm.

Introduction

The purpose of this entry is to understand the characteristics of each method of the bandit algorithm by performing simulations assuming actual use cases.

The following Qiita post is often referred to in Japanese about the bandit algorithm. http://qiita.com/yuku_t/items/6844aac6008911401b19

The following materials also introduce the details and features of each method and simple simulations. http://www.slideshare.net/greenmidori83/ss-28443892

Since the introduction of the method in the above material is very easy to understand, we will not introduce the method in this entry.

Assumed use case

You are currently running an ad that has been displayed 10,000 times and has a click rate of 1.2% for 60 yen per click. I decided to try different patterns of banners to find more clicked ads. I made 10 new banners. Suppose two of these banners have a better CTR than the current one, three are the same as the current ad, and the other five have a lower CTR than the current one.

--1.3% 2 pieces --1.2% is 3 --1.1% is 3 --Two 1.0%

Why you want to increase your CTR

It has nothing to do with the main subject, but why do you want to increase the click rate, and why does it make sense to experiment when the click rate increases by only 0.1%? Let's think about that. You might think that increasing your CTR doesn't make sense because you pay the same amount of money each time you click. However, from the perspective of the advertiser, placing an ad with a low click rate will reduce sales, so if the click rate is low, it will be difficult to display the ad. At that time, an index called eCPM is often used. eCPM is an index of how much sales can be made by displaying the advertisement 1000 times. Click rate * 1000 * Calculated by the cost per click. Since ads with high EPCM are more likely to be displayed, increasing the click rate will allow many users to be directed to your landing page at the same cost per click. For example, in this case, the click rate is 1.2% and the cost per click is 60 yen, so the eCPM is 720 yen. If the click rate increases by 0.1%, the eCPM will be 780 yen. At that time, even if the cost per click is 56 yen, the eCPM will be 726 yen, which means that if the click rate increases by 0.1%, the unit price for getting the same amount of guidance to the landing page can be reduced by 4 yen. For example, if a product is purchased on that landing page at 1%, the cost of selling one product will be reduced by 400 yen. On a case-by-case basis, increasing CTR in this way is a very important factor in selling things through web advertising.

simulation

The implementation was borrowed from the following

https://github.com/johnmyleswhite/BanditsBook

Let's take a look at Epsilon Greedy, Softmax, UCB

Epsilon Greedy

Epsilon Greedy is a method of choosing an option that you do not think is the best now with a certain probability. First of all, I will try to search at about 10%. Let's try 1000 times.

execfile("core.py")
arm_rates = [0.012, 0.013, 0.013, 0.012, 0.012, 0.012, 0.011, 0.011, 0.011, 0.01, 0.01]
arms = [BernoulliArm(r) for r in arm_rates]
counts = [10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
values = [0.012, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
algo1 = EpsilonGreedy(0.1, counts, values)
for t in range(1000):
    chosen_arm = algo1.select_arm()
    reward = arms[chosen_arm].draw()
    algo1.update(chosen_arm, reward)
base 1 2 3 4 5 6 7 8 9 10
Number of trials 77 109 82 38 38 50 264 205 49 47 41
Evaluation probability 1.2% 0.9% 3.7% 0% 0% 2.0% 2.3% 2.0% 0% 0% 0%

You can't tell at all by trying 1,000 times with 11 options. This result will vary greatly from time to time as it also depends on the generation of random numbers.

What if you try 10,000 times?

base 1 2 3 4 5 6 7 8 9 10
Number of trials 1,321 824 481 623 1,827 562 1,026 968 452 1206 710
Evaluation probability 1.2% 1.2% 0.6% 1.2% 1.3% 0.7% 1.2% 1.1% 0.9% 1.2% 1.3%

It's not a terrible result, but it's hard to say that it's a good result.

Next, try increasing the degree of search to about 50%. First 1,000 times

base 1 2 3 4 5 6 7 8 9 10
Number of trials 239 70 40 72 37 44 137 45 52 211 53
Evaluation probability 1.2% 1.4% 0% 1.4% 0% 0% 1.5% 0% 0% 1.9% 0%

Is it a state where Epsilon is being searched evenly compared to when it is 10%? Next 10,000 times

base 1 2 3 4 5 6 7 8 9 10
Number of trials 1,035 1,915 8,120 1,046 1,026 1,104 885 899 1,007 1,354 1,609
Evaluation probability 1.2% 1.3% 1.3% 1.1% 1.1% 1.0% 0.9% 1.1% 1.1% 0.7% 1.2%

I was able to find the best banner, but it just happened. It can be seen that other banners are being tried evenly compared to when it was 10%. Since there is a large variation from trial to trial, let's evaluate by epcm average repeated multiple times.

eCPM asks for:

def calc_ecpm(algo):
    total = sum(algo.counts) - 10000
    ecpm = 0.0
    for idx, count in enumerate(algo.counts):
        if idx == 0:
            count -= 10000
        if count == 0:
            continue
        ecpm += 1000 * 60 * algo.values[idx] * float(count)/total
    return ecpm

Then, when the parameters are set from 0.1 to 0.9 in 0.1 increments and the bandit of 10,000 trials is executed 1,000 times for each parameter, the ecpm will be like this.

def run_eg(ep, counts, values, arms):
    algo = EpsilonGreedy(ep, counts, values)
    for t in xrange(10000):
        chosen_arm = algo.select_arm()
        reward = arms[chosen_arm].draw()
        algo.update(chosen_arm, reward)
    return calc_ecpm(algo)


def run():
    data = []
    arm_rates = [0.012, 0.013, 0.013, 0.012, 0.012, 0.012, 0.011, 0.011, 0.011, 0.01, 0.01]
    arms = [BernoulliArm(r) for r in arm_rates]
    counts = [10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    values = [0.012, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for ep in xrange(1, 10):
        print ep * 0.1
        for _ in xrange(10000):
            data.append((ep*0.1, run_eg(ep*0.1, counts, values, arms)))
    return pandas.DataFrame(data, columns=['ep', 'epcm'])

result = run()
result.boxplot(by="ep")

kobito.1419502436.969446.png

If Epsilon is small, it will vary greatly. However, it seems that the larger the result, the smaller the result. It was also found that it basically exceeds the original ecpm of 720. Even such a simple algorithm seems to give sufficiently useful results.

SoftMax

The parameter of Softmax is tau. The smaller it is, the more you trust the already calculated value. This will try the parameters from 2 ^ -5 to 2 ^ 5 1,000 times each.

def run_sm(tau, counts, values, arms):
    algo = Softmax(tau, counts, values)
    for t in xrange(10000):
        chosen_arm = algo.select_arm()
        reward = arms[chosen_arm].draw()
        algo.update(chosen_arm, reward)
    return calc_epcm(algo)
    
    
def run():
    data = []
    arm_rates = [0.012, 0.013, 0.013, 0.012, 0.012, 0.012, 0.011, 0.011, 0.011, 0.01, 0.01]
    arms = [BernoulliArm(r) for r in arm_rates]
    counts = [10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    values = [0.012, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for i in xrange(10):
        tau = 2**(i - 5)
        print tau
        for _ in xrange(10000):
            data.append((tau, run_sm(tau, counts, values, arms)))
    return pandas.DataFrame(data, columns=['tau', 'epcm'])

result = run()
result.boxplot(by="tau")

kobito.1419507458.816533.png

I got worse results than I normally do! This is because softmax always keeps drawing bad values with a certain probability. When using softmax, it seems necessary to cut off bad options quickly.

UCB

Finally UCB UCB has no parameters so I'll just do it

def run_ucb(counts, values, arms):
    algo = UCB1(counts, values)
    for t in xrange(10000):
        chosen_arm = algo.select_arm()
        reward = arms[chosen_arm].draw()
        algo.update(chosen_arm, reward)
    return calc_epcm(algo)
    
    
def run():
    data = []
    arm_rates = [0.012, 0.013, 0.013, 0.012, 0.012, 0.012, 0.011, 0.011, 0.011, 0.01, 0.01]
    arms = [BernoulliArm(r) for r in arm_rates]
    counts = [10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    values = [0.012, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for _ in xrange(10000):
            data.append(run_ucb(counts, values, arms))
    return pandas.DataFrame(data, columns=['epcm'])

result = run()

kobito.1419517870.866669.png

It can be seen that the search can be performed without lowering the epcm. It will be lower than Epsilon Greedy, but this area will probably be a trade-off with search speed.

Comparison and consideration of each method

I want to write it later. I also want to compare it with the A / B test.

Summary

I can't say which method is better in this entry. I think there are a lot of things to think about if you want to explore more use cases, I hope it will lead to some understanding of what the bandit algorithm is.

The bandit algorithm is characterized by being able to search while utilizing the data assets obtained so far. I think one of the benefits is that exploration can reduce the risk of putting a business at risk by lowering important indicators such as eCPM. Therefore, I wanted to verify what kind of utility can be obtained when there are data assets in advance, rather than making the right choice, so I tried this kind of simulation. We hope you find it helpful. And if you have any opinions, please feel free to comment!

Recommended Posts

I tried to simulate ad optimization using the bandit algorithm.
I tried to approximate the sin function using chainer
I tried to simulate the dollar cost averaging method
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to compress the image using machine learning
I tried to simulate how the infection spreads with Python
[TF] I tried to visualize the learning result using Tensorboard
I tried to approximate the sin function using chainer (re-challenge)
I tried to move the ball
I tried using the checkio API
I tried to output the access log to the server using Node.js
I tried to estimate the interval.
I tried to get the index of the list using the enumerate function
I tried to digitize the stamp stamped on paper using OpenCV
I tried using Azure Speech to Text.
I tried to summarize the umask command
I tried to recognize the wake word
I tried using Bayesian Optimization in Python
I tried to classify text using TensorFlow
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried using the BigQuery Storage API
I tried to predict Covid-19 using Darts
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to execute SQL from the local environment using Looker SDK
I tried to get the batting results of Hachinai using image processing
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to control multiple servo motors MG996R using the servo driver PCA9685.
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried to extract and illustrate the stage of the story using COTOHA
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to analyze the New Year's card by myself using python
I tried web scraping to analyze the lyrics.
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
I tried to optimize while drying the laundry
I tried to save the data with discord
I tried to synthesize WAV files using Pydub.
I tried using PyCaret at the fastest speed
I tried using the Google Cloud Vision API
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the Datetime module by Python
Qiita Job I tried to analyze the job offer
I tried using the image filter of OpenCV
LeetCode I tried to summarize the simple ones
I tried using the functional programming library toolz
I tried to implement the traveling salesman problem
I tried to make a ○ ✕ game using TensorFlow
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
[Python] I tried to judge the member image of the idol group using Keras
I tried using the Python library "pykakasi" that can convert kanji to romaji.
I tried running an object detection tutorial using the latest deep learning algorithm
I tried using parameterized
I tried using argparse