Introduction

In my business, I develop software using combinatorial optimization technology (for example, minimizing transportation costs in logistics). In the past, I used to create optimization models using C ++ and C #, but nowadays I often use Python. Here, we will introduce Python in optimization.

Benefits of Python

The reasons for using Python are as follows.

--Easy to understand. Since the model by mathematical formula and the model by Python are close, you can concentrate on more essential description and create a model that is easy to maintain.

--Short description amount is required. Compared to C ++ etc., the size of the program is a fraction. --Learning cost is small. It has a simple grammar and few reserved words. --Can be completed with Python. Since it is a general-purpose language, processing for various purposes can be described in almost all Python. For example, you can get data from the web, aggregate it, analyze it, optimize it, visualize it, and so on, all in Python. --There are many libraries. The package community site https://pypi.python.org/pypi alone has about 90,000 packages published. Many other packages are also available on https://github.com/ and https://anaconda.org/. --Can be executed in various environments. There are various operating systems such as Windows, Mac, and Linux, and processing systems such as Cython, Pypy, and IronPython. --Many optimization software supports Python. There are many optimization software, both paid and free, but many are available from Python.

Python is said to run slower than compiler languages such as C ++. However, in optimization, Python is mainly used for model creation (modeling), and dedicated software (solver) written in C ++ etc. is used for execution of optimization algorithms. Therefore, even if you use Python for optimization, the execution time does not matter much.

Optimizing modeling mainly uses PuLP and pandas packages.

--PuLP is a package of mathematical modeling and pandas is a package of data analysis. ――Pandas is suitable for handling data that can be represented in a table among the data included in the model, and can describe complicated processing in an easy-to-understand manner. Also, pandas uses numpy internally. --Numpy uses a highly optimized linear algebra library written in C or Fortran to efficiently calculate matrix calculations.

About PuLP

To solve a mathematical optimization problem, do the following steps:

--Create a mathematical model in the modeler --Call out the solver and get a solution

Solver is software that takes a mathematical model as input, solves the mathematical model, and outputs the value (solution) of a variable.

PuLP is software created by the COIN project and will be a modeler. With PuLP, you can use various solvers such as CBC, Gurobi, and GLPK. By default, CBC is used. When you install PuLP, CBC will be installed at the same time.

--CBC: COIN Project Free Solver (short for COIN-OR Branch and Cut) https://projects.coin-or.org/Cbc --Gurobi: High-performance commercial solver http://www.gurobi.com/ --GLPK: GNU free solver www.gnu.org/software/glpk/

The problem that PuLP can handle is the mixed integer optimization problem. The mixed integer optimization problem is a kind of mathematical optimization problem and has the following features.

--Represented using continuous (real) variables and integer variables --The objective function and constraints are linear expressions

If you want to find out more details, please refer to the reference site.

How to use PuLP

Consider the following problem.

`problem`


I want to create a lot of chemical products X and Y that can be synthesized from materials A and B.
To make 1kg of X, you need 1kg of A and 3kg of B.
To make 1kg of Y, you need 2kg of A and 1kg of B.
The price of both X and Y is 100 yen per kg.
To maximize the sum of the prices of X and Y when material A weighs only 16 kg and B weighs only 18 kg
Find out how many X and Y should be created.

The problem is expressed by a mathematical model as follows. Representing a mathematical model with an expression is called formalization.

Let's model this with PuLP and solve it.

`python3`


from pulp import *
m = LpProblem(sense=LpMaximize) #Mathematical model
x = LpVariable('x', lowBound=0) #variable
y = LpVariable('y', lowBound=0) #variable
m += 100 * x + 100 * y #Objective function
m += x + 2 * y <= 16 #Upper limit constraint for material A
m += 3 * x + y <= 18 #Upper limit constraint for material B
m.solve() #Run solver
print(value(x), value(y)) # 4, 6

The following is a brief explanation in order.

Package import

from pulp import *

Creating a mathematical model

For minimization problems: m = LpPrblem () For maximization problems: m = LpProblem (sense = LpMaximize)

Creating variables

Continuous variable: x = LpVariable (variable name, lowBound = 0) 0-1 Variable: x = LpVariable (variable name, cat = LpBinary) List of continuous variables: x = [LpVariable (i-th variable name, lowBound = 0) for i in range (n)] Variable names must be different

Objective function setting

m + = expression

Add constraints

m + = expression == expression m + = expression <= expression m + = expression> = expression

Expression example

2 * x + 3 * y - 5

Sum: lpSum (list of variables) Inner product: lpDot (list of coefficients, list of variables)

Run solver

m.solve()

Values of variables, expressions and objective functions

value (variable), value (expression), value (m.objective)

About the combination of PuLP and pandas

Combining PuLP and pandas and managing variables (LpVariable) in a pandas table (DataFrame) makes the formulation easier to understand.

Let's take the transportation optimization problem as an example.

Transport optimization problem

I want to transport parts from warehouses to factories. I would like to find a plan that minimizes transportation costs.

--I want to determine the amount of transportation from the warehouse group to the factory group → Variable --I want to minimize transportation costs → Objective function ――The amount of goods that can be shipped from each warehouse is less than the amount that can be supplied → Restrictions ――Delivery to each factory is more than demand → Restrictions

Transportation costs		Assembly factory
Transportation costs		F1	F2	F3	F4	Supply
Warehouse	W1	10	10	11	17	35
	W2	16	19	12	14	41
	W3	15	12	14	12	42
	Demand	28	29	31	25

Parameter setting

Set the required parameters. (The numbers are the same as in the previous table)

`python3`


import numpy as np, pandas as pd
from itertools import product
from pulp import *
np.random.seed(1)
nw, nf = 3, 4
pr = list(product(range(nw),range(nf)))
Supply= np.random.randint(30, 50, nw)
demand= np.random.randint(20, 40, nf)
Shipping costs= np.random.randint(10, 20, (nw,nf))

Mathematical model without pandas

Variables are accessed by subscripts.

`python3`


m1 = LpProblem()
v1 = {(i,j):LpVariable('v%d_%d'%(i,j), lowBound=0) for i,j in pr}
m1 += lpSum(Shipping costs[i][j] * v1[i,j] for i,j in pr)
for i in range(nw):
    m1 += lpSum(v1[i,j] for j in range(nf)) <=Supply[i]
for j in range(nf):
    m1 += lpSum(v1[i,j] for i in range(nw)) >=demand[j]
m1.solve()
{k:value(x) for k,x in v1.items() if value(x) > 0}
>>>
{(0, 0): 28.0,
 (0, 1): 7.0,
 (1, 2): 31.0,
 (1, 3): 5.0,
 (2, 1): 22.0,
 (2, 3): 20.0}

Mathematical model using pandas

Variables can be accessed by table attributes. First, let's create a table.

`python3`


a = pd.DataFrame([(i,j) for i, j in pr], columns=['Warehouse', 'factory'])
a['Shipping costs'] = Shipping costs.flatten()
a[:3]

	Factory	Transportation Costs
0	0	10
1	1	10
2	2	11

Let's make a mathematical model in the same way.

`python3`


m2 = LpProblem()
a['Var'] = [LpVariable('v%d'%i, lowBound=0) for i in a.index]
m2 += lpDot(a.Shipping costs, a.Var)
for k, v in a.groupby('Warehouse'):
    m2 += lpSum(v.Var) <=Supply[k]
for k, v in a.groupby('factory'):
    m2 += lpSum(v.Var) >=demand[k]
m2.solve()
a['Val'] = a.Var.apply(value)
a[a.Val > 0]

	Warehouse	Factory	Shipping costs	Var	Val
0	0	0	10	v0	28.0
1	0	1	10	v1	7.0
6	1	2	12	v6	31.0
7	1	3	14	v7	5.0
9	2	1	12	v9	22.0
11	2	3	12	v11	20.0

Expressions using subscripts had to remember what the subscripts meant. However, the combination of PuLP and pandas makes the mathematical model easier to understand, as shown below.

--You can use column names such as "warehouse" instead of just "i". --You can construct mathematical formulas using conditional expressions in pandas. (Reference Solving the N Queen problem with combinatorial optimization) --You can use convenient functions of pandas (groupby, etc.).

Reference site

that's all