In my business, I develop software using combinatorial optimization technology (for example, minimizing transportation costs in logistics). In the past, I used to create optimization models using C ++ and C #, but nowadays I often use Python. Here, we will introduce Python in optimization.
The reasons for using Python are as follows.
--Easy to understand. Since the model by mathematical formula and the model by Python are close, you can concentrate on more essential description and create a model that is easy to maintain.
--Short description amount is required. Compared to C ++ etc., the size of the program is a fraction. --Learning cost is small. It has a simple grammar and few reserved words. --Can be completed with Python. Since it is a general-purpose language, processing for various purposes can be described in almost all Python. For example, you can get data from the web, aggregate it, analyze it, optimize it, visualize it, and so on, all in Python. --There are many libraries. The package community site https://pypi.python.org/pypi alone has about 90,000 packages published. Many other packages are also available on https://github.com/ and https://anaconda.org/. --Can be executed in various environments. There are various operating systems such as Windows, Mac, and Linux, and processing systems such as Cython, Pypy, and IronPython. --Many optimization software supports Python. There are many optimization software, both paid and free, but many are available from Python.
Python is said to run slower than compiler languages such as C ++. However, in optimization, Python is mainly used for model creation (modeling), and dedicated software (solver) written in C ++ etc. is used for execution of optimization algorithms. Therefore, even if you use Python for optimization, the execution time does not matter much.
Optimizing modeling mainly uses PuLP and pandas packages.
--PuLP is a package of mathematical modeling and pandas is a package of data analysis. ――Pandas is suitable for handling data that can be represented in a table among the data included in the model, and can describe complicated processing in an easy-to-understand manner. Also, pandas uses numpy internally. --Numpy uses a highly optimized linear algebra library written in C or Fortran to efficiently calculate matrix calculations.
To solve a mathematical optimization problem, do the following steps:
--Create a mathematical model in the modeler --Call out the solver and get a solution
Solver is software that takes a mathematical model as input, solves the mathematical model, and outputs the value (solution) of a variable.
PuLP is software created by the COIN project and will be a modeler. With PuLP, you can use various solvers such as CBC, Gurobi, and GLPK. By default, CBC is used. When you install PuLP, CBC will be installed at the same time.
--CBC: COIN Project Free Solver (short for COIN-OR Branch and Cut) https://projects.coin-or.org/Cbc --Gurobi: High-performance commercial solver http://www.gurobi.com/ --GLPK: GNU free solver www.gnu.org/software/glpk/
The problem that PuLP can handle is the mixed integer optimization problem. The mixed integer optimization problem is a kind of mathematical optimization problem and has the following features.
--Represented using continuous (real) variables and integer variables --The objective function and constraints are linear expressions
If you want to find out more details, please refer to the reference site.
Consider the following problem.
problem
I want to create a lot of chemical products X and Y that can be synthesized from materials A and B.
To make 1kg of X, you need 1kg of A and 3kg of B.
To make 1kg of Y, you need 2kg of A and 1kg of B.
The price of both X and Y is 100 yen per kg.
To maximize the sum of the prices of X and Y when material A weighs only 16 kg and B weighs only 18 kg
Find out how many X and Y should be created.
The problem is expressed by a mathematical model as follows. Representing a mathematical model with an expression is called formalization.
Let's model this with PuLP and solve it.
python3
from pulp import *
m = LpProblem(sense=LpMaximize) #Mathematical model
x = LpVariable('x', lowBound=0) #variable
y = LpVariable('y', lowBound=0) #variable
m += 100 * x + 100 * y #Objective function
m += x + 2 * y <= 16 #Upper limit constraint for material A
m += 3 * x + y <= 18 #Upper limit constraint for material B
m.solve() #Run solver
print(value(x), value(y)) # 4, 6
The following is a brief explanation in order.
from pulp import *
For minimization problems: m = LpPrblem () For maximization problems: m = LpProblem (sense = LpMaximize)
Continuous variable: x = LpVariable (variable name, lowBound = 0) 0-1 Variable: x = LpVariable (variable name, cat = LpBinary) List of continuous variables: x = [LpVariable (i-th variable name, lowBound = 0) for i in range (n)] Variable names must be different
m + = expression
m + = expression == expression m + = expression <= expression m + = expression> = expression
2 * x + 3 * y - 5
Sum: lpSum (list of variables) Inner product: lpDot (list of coefficients, list of variables)
m.solve()
value (variable), value (expression), value (m.objective)
Combining PuLP and pandas and managing variables (LpVariable) in a pandas table (DataFrame) makes the formulation easier to understand.
Let's take the transportation optimization problem as an example.
I want to transport parts from warehouses to factories. I would like to find a plan that minimizes transportation costs.
--I want to determine the amount of transportation from the warehouse group to the factory group → Variable --I want to minimize transportation costs → Objective function ――The amount of goods that can be shipped from each warehouse is less than the amount that can be supplied → Restrictions ――Delivery to each factory is more than demand → Restrictions
Transportation costs td> | Assembly factory td> tr> | |||||
F1 td> | F2 td> | F3 td> | F4 td> | Supply td> tr> | ||
Warehouse td> | W1 td> | 10 td> | 10 td> | 11 td> | 17 td> | 35 td> tr> |
W2 | 16 | 19 | 12 | 14 | 41 | |
W3 | 15 | 12 | 14 | 12 | 42 | |
td> | Demand td> | 28 td> | 29 td> | 31 td> | 25 td> tr> |
Set the required parameters. (The numbers are the same as in the previous table)
python3
import numpy as np, pandas as pd
from itertools import product
from pulp import *
np.random.seed(1)
nw, nf = 3, 4
pr = list(product(range(nw),range(nf)))
Supply= np.random.randint(30, 50, nw)
demand= np.random.randint(20, 40, nf)
Shipping costs= np.random.randint(10, 20, (nw,nf))
Variables are accessed by subscripts.
python3
m1 = LpProblem()
v1 = {(i,j):LpVariable('v%d_%d'%(i,j), lowBound=0) for i,j in pr}
m1 += lpSum(Shipping costs[i][j] * v1[i,j] for i,j in pr)
for i in range(nw):
m1 += lpSum(v1[i,j] for j in range(nf)) <=Supply[i]
for j in range(nf):
m1 += lpSum(v1[i,j] for i in range(nw)) >=demand[j]
m1.solve()
{k:value(x) for k,x in v1.items() if value(x) > 0}
>>>
{(0, 0): 28.0,
(0, 1): 7.0,
(1, 2): 31.0,
(1, 3): 5.0,
(2, 1): 22.0,
(2, 3): 20.0}
Variables can be accessed by table attributes. First, let's create a table.
python3
a = pd.DataFrame([(i,j) for i, j in pr], columns=['Warehouse', 'factory'])
a['Shipping costs'] = Shipping costs.flatten()
a[:3]
th> | Warehouse th> | Factory th> | Transportation Costs th> tr> |
---|---|---|---|
0 | 0 | 0 | 10 |
1 | 0 | 1 | 10 |
2 | 0 | 2 | 11 |
Let's make a mathematical model in the same way.
python3
m2 = LpProblem()
a['Var'] = [LpVariable('v%d'%i, lowBound=0) for i in a.index]
m2 += lpDot(a.Shipping costs, a.Var)
for k, v in a.groupby('Warehouse'):
m2 += lpSum(v.Var) <=Supply[k]
for k, v in a.groupby('factory'):
m2 += lpSum(v.Var) >=demand[k]
m2.solve()
a['Val'] = a.Var.apply(value)
a[a.Val > 0]
Warehouse th> | Factory th> | Shipping costs th> | Var | Val | |
---|---|---|---|---|---|
0 | 0 | 0 | 10 | v0 | 28.0 |
1 | 0 | 1 | 10 | v1 | 7.0 |
6 | 1 | 2 | 12 | v6 | 31.0 |
7 | 1 | 3 | 14 | v7 | 5.0 |
9 | 2 | 1 | 12 | v9 | 22.0 |
11 | 2 | 3 | 12 | v11 | 20.0 |
Expressions using subscripts had to remember what the subscripts meant. However, the combination of PuLP and pandas makes the mathematical model easier to understand, as shown below.
--You can use column names such as "warehouse" instead of just "i". --You can construct mathematical formulas using conditional expressions in pandas. (Reference Solving the N Queen problem with combinatorial optimization) --You can use convenient functions of pandas (groupby, etc.).
--Qiita article -Use combinatorial optimization -Sum of variables in mathematical model -Power of combinatorial optimization solver -Investigate duality issues -Mathematical optimization for free work with Python + PuLP -Mathematical Optimization Modeler (PuLP) Cheat Sheet (Python) --Example of combination of PuLP and pandas -[Visualization of data by prefecture (4 color problem)](http://qiita.com/Tsutomu-KKE@github/items/6d17889ba47357e44131#4%E8%89%B2%E5%95%8F%E9%A1% 8C) -Let's decide the date course by combinatorial optimization -Solving N Queen problem with combinatorial optimization -Let's find out the schedule of the tournament that makes you sick -Creating a conference program with combinatorial optimization -Maximize restaurant sales by combinatorial optimization -Optimize road installation -Sudoku is combined and solved optimally -Consider menus by combinatorial optimization --Articles other than Qiita -Combinatorial optimization (Professor Matsui) (PDF 2 pages) -Expected solution to large-scale combinatorial optimization problem (Professor Umetani) (PDF page 51) --Books -"Can be used from today! Combinatorial optimization" -"Business Analytics in Python Language" -"Aspects of Modeling (Series: Optimized Modeling)" --Solver related -PuLP documentation -Memo of Integer Programming (Miyashiro-sensei) -Introduction to Formulation by Integer Programming -Introduction to Integer Programming Solver -Mathematical optimization by ZIMPL language and SCIP - Gurobi Optimizer
that's all
Recommended Posts