I will do bingo at the company's year-end party. You were asked to develop a system for bingo. However, it has the following requirements (fiction).
-"7 prizes" (→ 7 winners) ―― "Please keep the ratio of winners between men and women as constant as possible." ―― “Please keep the ratio of winners in each department as constant as possible” -"Set the winning probability (weight) of young people, veterans, and officers to 1, 2, and 3."
(Reference: Similar story)
Combinatorial optimization allows you to find combinations that take into account various conditions. Winners are selected and arranged in order according to the winning probability. According to the order, the $ i $ th score is $ 2 ^ {1-i} $. By maximizing the sum of the scores, we will select the winners according to the winning probability as much as possible. The ratio of men and women and the ratio of departments are subtracted from the objective function by penalizing the amount that deviates from the ratio.
Load the participant information into pandas.DataFrame (if the code can't be executed, download the CSV from the URL and load it).
import numpy as np, pandas as pd, urllib
from pulp import lpDot, lpSum
from ortoolpy import model_max, addbinvars, addvars, addvals
url = 'https://www.dropbox.com/s/refo0vfj5wbmv2h/bingo.csv?dl=1'
with urllib.request.urlopen(url) as fp:
df = pd.read_csv(fp)
Each row of DataFrame (df
) is a participant.
The winning probability (Rate
column) is used as a ratio to win in order with numpy.random.choice
. Calculate the score with that order as the index and make it into the Score
column.
Optimization maximizes the sum of the scores.
Add a column Var
of variables that indicate whether or not you have won.
Also, pena1
is a penalty that deviates from the ratio of men and women. Let pena2
be the amount of penalty that deviates from the affiliation ratio.
nn = len(df) #Number of people
rnd = np.random.choice(df.index, nn, False, df.Rate / df.Rate.sum())
df['Score'] = pd.Series(2.0**-np.arange(nn), index=rnd) #Score
addbinvars(df) #Added variable (whether winning or not) as Var column
pena1, pena2 = addvars(2) #Two penalties for deviating from the ratio
print(df)
Name | IsMale | Div | Rate | Score | Var | |
---|---|---|---|---|---|---|
0 | Hidehito Asakura | True | General Affairs Department | 2 | 0.000488 | v000001 |
1 | Ryo Miyagawa | True | Accounting department | 1 | 0.001953 | v000002 |
2 | Kenji Furuhashi | True | General Affairs Department | 2 | 0.000977 | v000003 |
3 | Motomura sentence | False | General Affairs Department | 3 | 0.000015 | v000004 |
4 | Keiji Otsubo | True | Accounting department | 1 | 0.000002 | v000005 |
5 | Kazutaka Hatakeyama | True | General Affairs Department | 2 | 0.000031 | v000006 |
6 | Wakana Takeuchi | False | Accounting department | 2 | 0.000008 | v000007 |
7 | Yoda Kanae | False | Human Resources Department | 2 | 0.015625 | v000008 |
8 | Keisuke Furuta | True | Human Resources Department | 3 | 0.125000 | v000009 |
9 | Kei Ishikawa | True | Human Resources Department | 1 | 0.000061 | v000010 |
10 | Eriko Tominaga | False | Human Resources Department | 2 | 1.000000 | v000011 |
11 | Tobita Sato | False | General Affairs Department | 1 | 0.003906 | v000012 |
12 | Seito Kamiyama | True | Human Resources Department | 2 | 0.250000 | v000013 |
13 | Ryuichiro Tabata | True | General Affairs Department | 3 | 0.500000 | v000014 |
14 | Rie Kamei | False | Accounting department | 1 | 0.000004 | v000015 |
15 | Sone Haruka | False | General Affairs Department | 1 | 0.000244 | v000016 |
16 | Kasuga Kiyoshiro | True | Accounting department | 2 | 0.007812 | v000017 |
17 | Akisa Nakai | False | Accounting department | 3 | 0.062500 | v000018 |
18 | Koho Kaneshiro | False | Human Resources Department | 2 | 0.031250 | v000019 |
19 | Yoshikatsu Kakinuma | True | General Affairs Department | 1 | 0.000122 | v000020 |
Check the number of people by gender and department.
print(df.groupby('IsMale').Div.value_counts())
IsMale Div
False Human Resources Department 3
Accounting Department 3
General Affairs Department 3
True General Affairs Department 5
Human Resources Department 3
Accounting Department 3
Name: Div, dtype: int64
--Purpose function: Sum of scores --Number of people x (Amount of gender ratio deviation + Department ratio deviation amount) --Constraints --The number of winners is 7 ――The ratio by gender is constant --Constant ratio by department
ns = 7 #Number of winners
m = model_max() #Mathematical model
m += lpDot(df.Score, df.Var) - nn * (pena1 + pena2) #Objective function
m += lpSum(df.Var) == ns #Number of winners
for _, gr in df.groupby('IsMale'): #Keep the ratio by gender constant
m += lpSum(gr.Var) <= ns * len(gr) / nn + pena1
for _, gr in df.groupby('Div'): #Keep the ratio by department constant
m += lpSum(gr.Var) <= ns * len(gr) / nn + pena2
m.solve() #Solution
addvals(df) #Add result as Val column
print(df[df.Val > 0]) #Winner
Name | IsMale | Div | Rate | Score | Var | Val | |
---|---|---|---|---|---|---|---|
10 | Eriko Tominaga | False | Human Resources Department | 2 | 1.000000 | v000011 | 1.0 |
11 | Tobita Sato | False | General Affairs Department | 1 | 0.003906 | v000012 | 1.0 |
12 | Seito Kamiyama | True | Human Resources Department | 2 | 0.250000 | v000013 | 1.0 |
13 | Ryuichiro Tabata | True | General Affairs Department | 3 | 0.500000 | v000014 | 1.0 |
16 | Kasuga Kiyoshiro | True | Accounting department | 2 | 0.007812 | v000017 | 1.0 |
17 | Akisa Nakai | False | Accounting department | 3 | 0.062500 | v000018 | 1.0 |
19 | Yoshikatsu Kakinuma | True | General Affairs Department | 1 | 0.000122 | v000020 | 1.0 |
Winners have been decided. Let's look at the number of people by gender and department.
print(df[df.Val > 0].groupby('IsMale').Div.value_counts())
IsMale Div
False Human Resources Department 1
Accounting Department 1
General Affairs Department 1
True General Affairs Department 2
Human Resources Department 1
Accounting Department 1
Name: Div, dtype: int64
Looks good.
--Combinatorial optimization is a method for finding combinations that satisfy various conditions.
--The order of winning is decided according to the winning probability, and the score is [1, 0.5, 0.25, ...]
for each order. By maximizing the sum of these scores, we will try to select the winners in a predetermined order.
(Personally, I hope there is no demand for this article)
that's all
Recommended Posts