Mathematicatical programming is a method of maximizing (minimizing) a target value called an objective function under a certain constraint. Among them, linear programming (LP) means that the objective function and the constraint expression can be expressed by a linear expression (an expression of the form $ \ sum_ia_ix_i + b $) for variables. Linear programming is one of the simplest mathematical programming methods, but it has a rich library for solvers and modeling, providing an easy-to-use environment. In particular, companies often use monetary indicators such as profits and costs as objective functions to optimize controllable variables such as production and transportation.
In this article, I took one exercise from the following books that summarize the modeling techniques of mathematical optimization, formulated it, and implemented it in PuLP, the modeling language of Python, to find the optimum solution. I would like to come.
Model Building in Mathematical Programming https://www.amazon.co.jp/dp/1118443330/
In addition, this exercise is for the homework of reading in the Mathematical Optimization Group of TFUG. It is one. Anyone can join the TFUG group, so if you're interested, you might want to take a look at Slack!
This article has the following structure. After a brief explanation of how to model, we will introduce the problem setting of the example, and then model, implement, and verify it.
--Modeling flow --Problem setting --Modeling --Column 1: Arbitrary modeling --Model implementation --Model validation --Column 2: Mathematical optimization model test
There is a columnar section in the middle, so it's okay to skip it, but I'm writing about a topic that I personally find interesting, so please read it.
In general, to model some plan mathematically, you need to define two things:
Take the production plan of a factory as an example. First, the objective function represents the cost or benefit you want to optimize. For example, when producing in factories A and B, assume that the production cost can be expressed as:
So, next, as a constraint expression, we impose a condition on the variable to avoid such a situation. For example, the constraint "must produce more than XX" mentioned earlier can be expressed as follows.
In addition, if it is a production plan, there is often a limit to the production capacity of the factory, and there is often an upper limit on the production volume. Constraint expressions are mathematical expressions that express these business restrictions and rules.
In addition, the following information is required to define the objective function and constraint expression.
--Set --A collection of elements to be optimized. --Example: Factory, target period (month or date), etc. --Parameters --Numerical values used to calculate objective functions and constraint expressions. --Example: Production unit price, production volume upper limit, demand volume, etc. --Variables ――The numerical value you want to actually optimize. --Example: Production volume, transportation volume, etc.
If you want to actually model from scratch, personally, I think it's best to aim to define the objective function first. When trying to define an objective function, it is necessary to organize the necessary information in the process, which leads to the determination of sets, parameters, and variables. After that, for each variable, identify the necessary constraints and express them as expressions one by one.
In the following, we will introduce the problem setting of the example and actually model it.
This time, we will deal with food production planning. Specifically, from January to June, it is said that food is made by purchasing oil as a raw material every month, refining and mixing it (margarine or something ...?). In this production plan, we aim to maximize profits in the end by adjusting the purchase amount and consumption amount of each raw material oil each month.
In this issue, in order to think about objective functions and constraint expressions, we need to think about the costs and constraints related to the following factors.
--Inventory --Purchase --Purification
I will explain each situation in detail.
Inventories of feedstock increase with new purchases and decrease with consumption in food production.
Each feedstock can be kept in stock for up to $ 1000 [t] $. However, there is a storage cost of $ 5 [£ / t] $ per month.
Also, for any feedstock, inventory starts at $ 500 [t] $ and ends with $ 500 [t] $ or more.
The above oils will be purchased from January to June at the following unit prices.
Month | VEG 1 | VEG 2 | OIL 1 | OIL 2 | OIL 3 |
---|---|---|---|---|---|
Jan | 110 | 120 | 130 | 110 | 115 |
Feb | 130 | 130 | 110 | 90 | 115 |
Mar | 110 | 140 | 130 | 100 | 95 |
Apr | 120 | 110 | 120 | 120 | 125 |
May | 100 | 120 | 150 | 110 | 105 |
Jun | 90 | 100 | 140 | 80 | 135 |
There are no particular restrictions on the amount you can purchase.
Of the amount purchased and the amount originally in stock, a certain amount is subjected to the refining process. In the process of refining, there shall be no particular loss, and the cost of refining shall be negligible.
There are two types of feedstock, vegetable and non-vegetable, and the refining process is said to be carried out on different refining lines for each type of oil. The upper limit of the amount of purification is set as follows.
type | Upper limit of purification amount[t/Month] |
---|---|
Botanical | 200 |
Non-vegetable | 250 |
The types of raw material oil are as follows.
name | type |
---|---|
VEG 1 | Botanical |
VEG 2 | Botanical |
OIL 1 | Non-vegetable |
OIL 2 | Non-vegetable |
OIL 3 | Non-vegetable |
By mixing all the refined oils, the food to be shipped that month is completed. Again, there is no particular loss, so the total weight of the mixed feedstock is the amount of food produced.
In addition, each raw material oil has the hardness shown in the table below. The hardness of the food produced is averaged according to the consumption of raw material oil, and the hardness of the food must be between $ 3 $ and $ 6 $.
name | hardness |
---|---|
VEG1 | 8.8 |
VEG2 | 6.1 |
OIL1 | 2.0 |
OIL2 | 4.2 |
OIL3 | 5.0 |
It seems that the food produced can be sold out within the month. In other words, the production volume becomes the sales volume as it is. The unit sales price is fixed at $ 150 [£ / t] $ regardless of the month.
From now on, we will actually model the problem setting explained earlier. In addition, I wrote earlier that "it is better to decide the set and parameters while defining the objective function", but in the explanation it is easier to write the set, parameters and variables first, so I do so. ..
In a mathematical optimization model, a set is a subscript of a parameter or variable. For this issue, we will consider the following four sets:
** Raw oil **
** Target period (month) **
** Purification line **
** Purification target **
Since it is defined for each purification line, it is a set of sets (family of sets) as a whole.
I'm giving it amakudari here, but keep in mind that the way sets are defined is not unique. Depending on how the set is defined, the versatility of the model may be reduced. We'll talk more about this in the Constraint Expressions section.
Next, we will define the parameters used to calculate the objective function and constraint expression.
name | Subscript | Range | Description |
---|---|---|---|
Purchase unit price of raw material oil. | |||
- | Storage unit price of raw material oil. | ||
- | Food unit sales price. | ||
Initial inventory of feedstock. | |||
The lower limit of the final inventory of feedstock. | |||
- | Upper limit of raw material oil inventory. | ||
Hardness of raw material oil. | |||
Upper limit of purification amount. |
uc is an abbreviation for unit cost, which means unit price, and lb / ub is an abbreviation for lower / upper bound, respectively, which means upper and lower limits.
Similarly, define the variables to be optimized. It is desirable to use nouns for variable names, but if they are too long, the formula will be difficult to read, so some verbs are used.
name | Subscript | Range | Description |
---|---|---|---|
Purchase amount of raw material oil. | |||
Amount of raw material oil used. | |||
Food production. | |||
Monthly inventory of raw material oil. | |||
End-of-month inventory of raw material oil. |
For inventory, it is often sufficient to declare only one of $ \ text {opening_stock} $ and $ \ text {closing_stock} $. There are also personal tastes around here.
Now that we have the necessary information, let's define the objective function. This time, I wanted to maximize the profit for the entire period.
Next, write down the constraint expression. The simple upper and lower limits of variables are described as ranges in the "Variables" section.
The monthly inventory for January matches the initial inventory.
The end-of-month inventory in June shall be above the final inventory limit.
Monthly inventory after February matches end-of-month inventory for the previous month.
The end-of-month inventory is the amount obtained by adding the purchase amount to the monthly inventory of the month and subtracting the consumption amount.
The monthly production is equal to the sum of the oil consumption.
The part that says, "The hardness of the food produced is averaged according to the consumption of raw material oil, and the hardness of the food must be between $ 3 $ and $ 6
Furthermore, in the original formula, the behavior when the production amount is $ 0 $ is suspicious, but in the converted formula, both the consumption amount and the production amount are $ 0 $, so it can be seen that the constraint is satisfied. With this problem setting, it is possible that production will not be performed, so there is no problem.
In this production plan, there was an upper limit on the amount of refining for each refining line. Expressed as an expression, it can be written as follows.
Information about the purification line can be expressed differently. For example, instead of $ \ text {USED \ _OILS} $, define the parameter $ \ text {ref \ _line} $, which means the refinement line for each feedstock, as follows:
Subscript | value |
---|---|
VEG1 | VEG |
VEG1 | VEG |
OIL1 | NONVEG |
OIL2 | NONVEG |
OIL3 | NONVEG |
This is almost the same as the [Table of types of raw material oil mentioned in the problem statement](# oil-type), so it may be more natural to use this. Using this parameter, the upper limit of purification constraint can be expressed as:
What we are doing is the same as before, we just add up the consumption of the raw oil to be refined and limit it to the upper limit. With this model, you can get the same result with this data.
However, there is room for consideration as to which is preferable.
As a test, consider the case where a line called $ \ text {VEGNEW} $ is added to refine $ \ text {VEG1} $ and $ \ text {VEG2}
\text{USED_OILS}_{VEG} &= \{\text{VEG1}, \text{VEG2}, \text{VEG3}\}, \
\text{USED_OILS}_{VEGNEW} &= \{\text{VEG1}, \text{VEG2}\}, \
\text{USED_OILS}_{NONVEG} &= \{\text{OIL1}, \text{OIL2}\}.
\end{aligned}
$$
But what if you had a formulation that used $ \ text {ref \ _line} $? As you can see from the [Table above](# ref-line), this parameter assumes that ** one refining line is always set for one feedstock **. Therefore, the situation where $ \ text {VEG1} $ belongs to both $ \ text {VEG} $ and $ \ text {VEGNEW} $ cannot be represented in the model.
As you can see, there is a wide range of ways to have versatility, even if there is only one way to define sets and parameters. A poor understanding of the problem can lead to versatility (or failure) in the wrong direction, making it difficult to modify the model later (cf. Premature abstraction (cf. premature abstraction). premature abstraction).
Not all changes can be foreseen in advance, and the assumptions themselves may change due to law revisions, etc., but I realize that arbitrary assumptions are mixed in with the way these casual sets are defined. I think that is important. Also, when actually modeling, it is essential to deepen the understanding of the business with the help of users (domain experts).
We will implement the above model with Python + PuLP. In PuLP, sets and parameters can use any Python data structure, so it's enough to use something that's easy to use. In the following, we will introduce the definition of variables, the definition of objective functions, and the definition of constraint expressions.
The PuLP variables are declared as follows:
#Name Subscript Lower limit Upper limit Type (continuous or integer)
buy = LpVariable.dicts("buy", (OILS, TIME_IDX), lowBound=0, upBound=None, cat='Continuous')
By the way, although it is not mentioned in the document, it seems that it can be declared in the form of a matrix.
buy = LpVariable.matrix("buy", (OILS, TIME_IDX), lowBound=0, upBound=None, cat='Continuous')
First, the objective function is expressed using the defined variables and parameters.
#Calculation of objective function
total_sales = lpSum(produce[t] * sell_uc for t in TIME_IDX)
total_buy_cost = lpSum(buy[oil][t] * buy_uc[oil][t] for t in TIME_IDX for oil in OILS)
total_stock_cost = lpSum(closing_stock[oil][t] * stock_uc for t in TIME_IDX for oil in OILS)
total_cost = total_buy_cost + total_stock_cost
total_profit = total_sales - total_cost #Objective function
You can then define the linear programming problem LpProblem
and set it by adding the objective function.
#Model definition and objective function settings
model = LpProblem("Food manufacture 1", LpMaximize)
model += total_profit
You can also use methods to set the objective function as follows:
model.setObjective(total_profit)
If you declare more than one objective function, only the last one will be used.
Constraint expressions can be declared by adding them to model
, just like the objective function.
#Production balance
for t in TIME_IDX:
model += produce[t] == lpSum(use[oil][t] for oil in OILS)
It is written in the same way as the objective function, but it seems to be well identified from the class name. It can also be defined by a method.
model.addConstraint(produce[t] == lpSum(use[oil][t] for oil in OILS))
Although omitted so far, both the objective function and the constraint expression can be named individually.
model += produce[t] == lpSum(use[oil][t] for oil in OILS), "Production balance"
The entire code of the model implemented this time is summarized below.
import numpy as np
import pandas as pd
from pulp import LpProblem, LpMaximize, LpVariable, lpSum
#Definition of set
TIME_IDX = [1, 2, 3, 4, 5, 6]
OILS = ['VEG1', 'VEG2', 'OIL1', 'OIL2', 'OIL3']
REF_LINES = ['VEG', 'NONVEG']
USED_OILS = {
'VEG': ['VEG1', 'VEG2'],
'NONVEG': ['OIL1', 'OIL2', 'OIL3']
}
#Parameter setting
sell_uc = 150
stock_uc = 5
stock_ub = 1000
stock_init = 500
stock_final_lb = 500
prod_ub = {'VEG': 200, 'NONVEG': 250}
hardness_lb = 3
hardness_ub = 6
hardness = {'VEG1': 8.8, 'VEG2': 6.1, 'OIL1': 2.0, 'OIL2': 4.2, 'OIL3': 5.0}
buy_uc = {
'VEG1': {1: 110, 2: 130, 3: 110, 4: 120, 5: 100, 6: 90},
'VEG2': {1: 120, 2: 130, 3: 140, 4: 110, 5: 120, 6: 100},
'OIL1': {1: 130, 2: 110, 3: 130, 4: 120, 5: 150, 6: 140},
'OIL2': {1: 110, 2: 90, 3: 100, 4: 120, 5: 110, 6: 80},
'OIL3': {1: 115, 2: 115, 3: 95, 4: 125, 5: 105, 6: 135}
}
#Variable definition
buy = LpVariable.dicts("buy", (OILS, TIME_IDX), lowBound=0)
use = LpVariable.dicts("use", (OILS, TIME_IDX), lowBound=0)
produce = LpVariable.dicts("produce", TIME_IDX, lowBound=0)
opening_stock = LpVariable.dicts("opening_stock", (OILS, TIME_IDX), lowBound=0, upBound=stock_ub)
closing_stock = LpVariable.dicts("closing_stock", (OILS, TIME_IDX), lowBound=0, upBound=stock_ub)
#Calculation of objective function
total_sales = lpSum(produce[t] * sell_uc for t in TIME_IDX)
total_buy_cost = lpSum(buy[oil][t] * buy_uc[oil][t] for t in TIME_IDX for oil in OILS)
total_stock_cost = lpSum(closing_stock[oil][t] * stock_uc for t in TIME_IDX for oil in OILS)
total_cost = total_buy_cost + total_stock_cost
total_profit = total_sales - total_cost
#Model definition and objective function settings
model = LpProblem("Food manufacture 1", LpMaximize)
model += total_profit
#Constraint expression
#Initial inventory, final inventory
for oil in OILS:
model += opening_stock[oil][TIME_IDX[0]] == stock_init
model += closing_stock[oil][TIME_IDX[-1]] >= stock_final_lb
#Regarding each month
for t in TIME_IDX:
#Inventory balance
for oil in OILS:
if t != TIME_IDX[0]:
model += opening_stock[oil][t] == closing_stock[oil][t - 1]
model += closing_stock[oil][t] == opening_stock[oil][t] + buy[oil][t] - use[oil][t]
#Sales volume balance
model += produce[t] == lpSum(use[oil][t] for oil in OILS)
#hardness
total_hardness = lpSum(hardness[oil] * use[oil][t] for oil in OILS)
model += total_hardness <= hardness_ub * produce[t]
model += total_hardness >= hardness_lb * produce[t]
#Upper limit of purification amount
for line in REF_LINES:
total_prod_amount = lpSum(use[oil][t] for oil in USED_OILS[line])
model += total_prod_amount <= prod_ub[line]
model.solve()
print(model.objective.value()) #result: 107842.59264500001
Running the above code will give you the value 107842.59264500001
as the value of the objective function. This is the same as what was written in the original text, so it seems good to think that the optimal solution is being sought.
In the previous section, the same optimum value as the text was found. However, when actually using mathematical optimization at work, it is often the case that neither the optimum value nor the optimum solution is known, and it is rare to be able to "match the answers" like this time. Therefore, we verify the validity of the results by qualitatively analyzing the model.
The qualitative analysis here is to make a prediction about the value to be taken by the optimum solution from the definition of the model and compare it with the actual calculation result. For example, this time the first and last inventory (the lower limit) is fixed, and it costs according to the amount of inventory every month, so it is profitable to consume the inventory first and recover the inventory from the middle. Will be higher. "
Now, let's actually display the optimal solution and confirm this hypothesis. The optimal solution for end-of-term inventory can be obtained as follows:
closing_stock_value = {oil: {t: closing_stock[oil][t].value() for t in TIME_IDX} for oil in OILS}
Let's plot with the following code.
df = pd.DataFrame(closing_stock_value, index=TIME_IDX, columns=OILS)
df.plot()
With the exception of OIL1, which is not used at all, all of them have dropped from the initial inventory once and then recovered, confirming that the optimization results seem reasonable.
We tested only one hypothesis here, but in practice we make as many such reasonable assumptions as possible to test the optimization results. And if the optimal solution is as good as all the hypotheses, it seems that it can be implemented correctly. Conversely, if the optimal solution does not meet the hypothesis, then either the model implementation or the hypothesis is wrong. This is very useful as it gives you the opportunity to modify the model itself and gain a better understanding of the model.
By the way, some people may feel uncomfortable with the verification method I wrote earlier. If you're a software engineer, you might wonder, "If you're given constraints one by one, why not write unit tests for each one?" That's right, and if you can write it, you should write it.
However, mathematical optimization models are ** huge ** sets of ** tightly coupled ** constraints, and constraint-level unit tests are often difficult. At first glance, the constraint formulas mentioned earlier seem to be separate, and in fact they depend on each other in the model. For example, if you reduce the upper limit of the amount of purification, you will not be able to meet the hardness constraint. Of course, it is possible to test the model itself as if it were a huge function, but it is difficult to know the equivalence class and boundary values, and eventually we come back to "discovery hypothesis and verification". (Although it can be automated).
In fact, looking at the open source mathematical optimization solver test code, it seems that we have only solved a few toy problems that are easy to understand and have not been very rigorously tested (model testing and solver). Although it may be different in the test).
https://github.com/SCIP-Interfaces/CSIP/blob/6d24c495750c7927d1d9b1bae0e906d9c54fa439/test/test.c#L57-L63
I also searched for research and literature on mathematical optimization tests, but I couldn't find anything like that. There is a possibility that you just missed it, so if you have an opinion such as "This is good", I would appreciate it if you could comment.
For the time being, the difficulty of testing around here seems to be close to the problem of testing machine learning models that the MLSE community is working on, so it may be better to follow the trends of this community.
Challenges for machine learning systems toward continuous improvement https://www.slideshare.net/chezou/challenges-for-machine-learning-systems-toward-continuous-improvement
By the way, it doesn't matter, but I feel like I can't say anything when I see the following article, which seems to be using mathematical optimization, burning up and making comments like "Test it". (I'm not a related person ...).
It should be "a few seconds with AI" ... Nursery school selection, work after consecutive holidays Saitama City --Mainichi Shimbun https://mainichi.jp/articles/20200204/k00/00m/040/176000c
Even if you do your best to verify it with the method described so far, it is a headache because the model may become unbounded or infeasible due to unexpected data. Spicy: dog:
So, I modeled the production plan as a linear programming method, actually solved it with PuLP, and tried to verify the result. I intend to write not only a simple explanation of modeling, but also my own thoughts on modeling based on a little practical experience.
If you are starting to practice modeling, you may want to change this model in your own way. For example, you can set upper and lower limits for each type of oil purchased, or set it so that it loses 1% in the refining process.
I really wanted to make it a series and solve all the exercises, but I was exhausted: joy: If you just solve it, it's still difficult to write an article.
So, if you like this article, please subscribe! (YouTuber style).
Recommended Posts