** You **, the executive committee member of the research presentation of the academic society, has decided to create a program for the research presentation.
--There were applications for 60 presentations. --Each presentation has one or two related ** keywords **. ――The presentations of the four people will be combined into one session, and a total of 15 sessions will be created. --Each session has one session ** theme **. --The presentation in the session shall have the theme as a keyword. ――You have to decide the session theme for each application and divide it into 15 sessions.
--Each keyword of all presenters is a candidate for selection. --For the keyword, it is assumed that the degree of relevance to the announcement is set by (0-1) [^ 1]. -** We will maximize ** the "sum of relevance of selected candidates" **. --Each presentation must be ** assigned to one of the sessions **. ――For that purpose, some candidates will have an arbitrary category as a dummy for each presentation. --The relevance of any category is very small (-10).
[^ 1]: For example, it is possible to calculate with Word2Vec.
You can solve the above problem by using Combinatorial optimization. Let's formulate it.
Maximize td> | $ \ sum_i {Relevance_i x_i} $ td> | Sum of relevance of assigned candidates td> tr> |
Variables td> | $ x_i \ in \ {0, 1 \} $ td> | $ x_i $: $ i $ Select th candidate Whether td> tr> |
$ y_j \ in Integer greater than or equal to 0 $ td> | $ y_j $: $ j $ Number of sessions in the th category td> tr> | |
Constraints td> | $ \ sum_j {y_j} = 15 $ td> | Total number of sessions is 15 td> tr> |
$ \ sum_ {i \ in F_h} {x_i} = 1 ~ ~ ~ \ forall h \ in H $ td> | Each presentation is assigned exactly one keyword tr> td> tr> | |
$ \ sum_ {i \ in G_k} {x_i} \ le 4 y_j ~ ~ ~ \ forall k \ in C $ td> | The number of presentations in each category is less than the number of frames strong> td> tr> |
However, $ H $ is a set of presenters, $ F_h $ is a set of candidates for presenter $ h $, $ C $ is a set of categories, and $ G_k $ is a set of candidates for category $ k $.
Let's make the application table.
python3
import numpy as np, pandas as pd
from pulp import *
np.random.seed(3)
nu = 4 #Number of presentations per session
nr = 60 #Number of presenters
cat = 'Communication Medical Logistics Electric Power Civil Engineering Physical Chemistry Geometric Algebra Earth Science Biology'.split() #category
ns = nr / nu #Number of sessions
dat = [(i, j, np.random.rand()) for i in range(nr)
for j in np.random.choice(cat, np.random.randint(1, 3), replace=False)]
dat.extend([(i, 'Any', -10) for i in range(nr)]) # Anyカテゴリの追加
a = pd.DataFrame(dat, columns=['Presenter', 'category', 'Relevance']) #Candidate
a['vx'] = [LpVariable('vx%d'%i, cat=LpBinary) for i in a.index] #Which line to choose
print(a[:3])
Presenter th> | Category th> | Relevance th> | vx | |
---|---|---|---|---|
0 | 0 | Physics td> | 0.207243 | vx0 |
1 | 1 | power td> | 0.492636 | vx1 |
2 | 1 | creatures td> | 0.913301 | vx2 |
vx is the column that corresponds to the variable $ x $.
Let's tabulate the categories.
python3
b = pd.DataFrame(cat, columns=['name'])
b['vy'] = [LpVariable('vy%d'%j, cat=LpInteger, lowBound=0) for j in b.index] #Number of sessions
print(b[:3])
Name th> | vy | |
---|---|---|
0 | Communication td> | vy0 |
1 | Medical td> | vy1 |
2 | Logistics td> | vy2 |
vy is the column that corresponds to the variable $ y $.
Formulate and solve, and see the assignments that have become arbitrary categories.
python3
m = LpProblem(sense=LpMaximize)
m += lpDot(a.Relevance, a.vx)
m += lpSum(b.vy) == ns #Total number of sessions is equal
for i in range(nr):
m += lpSum(a.vx[a.Presenter==i]) == 1 # Presenterは1カテゴリを選ぶ
for _, r in b.iterrows():
m += lpSum(a.vx[a.category==r.name]) <= r.vy * nu #Announcement is less than the number of frames
m.solve()
a['rx'] = a.vx.apply(value) #Allocation result
b['ry'] = b.vy.apply(value) #Session count result
print(a[(a.rx > 0)&(a.category=='Any')])
Presenter th> | Category th> | Relevance th> | vx | rx | |
---|---|---|---|---|---|
117 | 26 | optional td> | -10.0 | vx117 | 1.0 |
rx is the column that corresponds to the result of the variable $ x $. Presenter "26" has assigned it to any category.
Let's look at the number of presentations for each category.
python3
print(a.category[(a.rx>0)].value_counts())
Category th> | |
---|---|
Communication th> | 12 |
Power th> | 8 |
Physics th> | 8 |
Biology th> | 7 |
Geometry th> | 4 |
Civil engineering th> | 4 |
Logistics th> | 4 |
Medical th> | 4 |
Earth science th> | 4 |
Chemistry th> | 4 |
Optional th> | 1 |
The biology category isn't a multiple of 4, so the crazy 26 would be in here.
that's all
Recommended Posts