The Yhat Blog

machine learning, data science, engineering

Decision Making Under Uncertainty: An Introduction to Robust Optimization (Part 1)

Measuring the Impact of Uncertainty

Data analytics is a process that incorporates data into building models that help us make decisions. These decisions should add value to our business. Working with suboptimal models, messy data, or insufficient information is a reality of practical analytics. Optimization techniques can help us see better outcomes in imperfect, real-world scenarios.

Robust optimization (RO) is a tool that helps us improve our decisions in uncertain scenarios by allowing us to add uncertainty that is present in a problem directly to a model. In this series of posts, I will introduce the idea of robust optimization and its philosophy. While I will use a very simple example to motivate and present its methodology, it is important to note that there is much more to learn about RO and I hope these posts motivate you to explore RO and think about the cost of uncertainty.

A Simple Production Model

We begin with a simple example that can be found in any introductory book on optimization. Imagine you are a production manager at a furniture company:

You make only two products: chairs and tables. Producing a single unit of chairs or tables (these might be batches of many chairs or tables) requires two resources: lumber and labor. Your goal is to maximize the number of chairs and tables produced, given the limited resources available for production.

Looking at the resources required table below, you must decide how to assign these resources to maximize your output. For simplicity we'll say it is fine to produce a fraction of a unit of chairs or tables (remember, they could be batches).

Resources Required (on average)
$$\begin{array}{r|cc} & \text{Lumber} & \text{Labor}\\ \hline \text{Chairs} & 1 & 2\\ \text{Tables} & 2 & 1\\ \hline \text{Total Available} & 6 & 6\\ \end{array}$$

The production assignment is not flexible. If things go wrong (say you run out of resources), you cannot change the production plan and the production line must be stopped (which imposes a huge cost, maybe your job!). So we have two decisions to make, which we represent using these decision variables:

$$x_1 = \text{number of chairs produced}$$ $$x_2 = \text{number of tables produced}$$

The optimization problem can be formulated as follows:

$$\begin{array}{rl} \max & z=x_1+x_2\\ \text{s.t.} & 2x_1+x_2 \leq 6\\ & x_1+2x_2 \leq 6\\ & x_1,x_2 \geq 0 \end{array}$$

This is a very simple optimization problem known as a deterministic linear program (LP). We could simply guess the value of the optimal solution to this problem, but here we'll use Python and PuLP to model and solve it.

import pulp

# We are trying to maximize output
production = pulp.LpProblem('Production planning', pulp.LpMaximize)

# Our decision variables are continuous and >= 0
chairs = pulp.LpVariable('chairs', 0)
tables = pulp.LpVariable('tables', 0)

# We want to maximize production
z = pulp.LpVariable('z', 0)
production += z == chairs + tables
production.setObjective(z)

# Resources used for each product can't exceed what's available
production += 2*chairs + 1*tables <= 6 # labor
production += 1*chairs + 2*tables <= 6 # lumber

# Solving the model
assert production.solve() == pulp.LpStatusOptimal

print 'Objective value: ', pulp.value(z)
print 'Number of chairs:', pulp.value(chairs)
print 'Number of tables:', pulp.value(tables)


Output:

Objective value:  4.0
Number of chairs: 2.0
Number of tables: 2.0


So we should produce 2 units of chairs and 2 units of tables, for a total of 4 units of product.

That was a simple demonstration of how optimization can help us make decisions.

But what if our estimates for resource usage are wrong?

Our production plan might not even be possible. For instance, if the average amount of labor required by a unit of chairs is 2.1 instead of 2, our plan can't be implemented and the production line stops.

Our model above uses the average resource consumption for each product to make a production plan. That is, we modeled this problem as a deterministic model and assumed that the resource consumption coefficient is exactly known. In reality, there are many sources of uncertainty in production planning. Consider the following cases:

• Our workers don't build chairs or tables at the same pace
• Human error increases the work required to make our product
• Lumber does not have the same quality from one piece of wood to another
• If a leg breaks during the sawing process, we have to replace it (more lumber)
• Some pieces have blemishes and need to be replaced

Using average resource consumption coefficients is a great approach if there is not much variability involved in resource usage. On the other hand, small deviations in resource usage can cause a production plan to become infeasible. In other words, this plan is not very robust with regard to changes in resource consumption.

The Impact of Uncertainty

Let's say we collect more data on the resource consumption of our products. Let $\check a_{ij}$ be the true amount of resource $i$ required to produce product $j$. We assume that the nominal value for the resource consumption coefficient is denoted by $\bar a_{ij}$, (average resource consumption). In addition, we observe that the resource consumption coefficient can deviate from its nominal value as much as $\hat a_{ij}$. Therefore, we can say this coefficient is an uncertain parameter that belongs to the known set (range) $\check a_{ij} \in [\bar a_{ij} - \hat a_{ij},\bar a_{ij} + \hat a_{ij}]$

For this problem, the nominal values and possible deviations are:

• Amount of labor used for one unit of chairs: $2 \pm 1$
• Amount of labor used for on unit of tables: $1 \pm 0.5$
• Amount of lumber used for one unit of chairs: $1 \pm 0.8$
• Amount of lumber used for on unit of tables: $2 \pm 1.5$

Now, knowing the form of uncertainty that exists in resource consumption, how do we change our plan?

Nominal case

One way to approach this problem is to stick with the nominal values (averages). Using the nominal values will give us the deterministic problem that we have already solved. The problem with the nominal case is that it is inherently risk-neutral. It does not consider the variations in uncertain parameters at all.

Optimistic Case

Perhaps you are very optimistic about your production process. You assume every uncertain parameter will have its best possible realization (in this case the $\bar a_{ij}−\hat a_{ij},\forall i,j$). This may not be the wisest thing to do, but if the plan works you'll do very well!

Optimism in practice

By assuming that the resource consumption coefficients will have their lowest possible value, you plan to produce the greatest number of products. Let's see what our optimistic plan actually is.

import pulp

production = pulp.LpProblem('Production planning', pulp.LpMaximize)

chairs = pulp.LpVariable('chairs', 0)
tables = pulp.LpVariable('tables', 0)

z = pulp.LpVariable('z', 0)
production += z == chairs + tables
production.setObjective(z)

production += 1.0*chairs + 0.5*tables <= 6 # labor
production += 0.2*chairs + 0.5*tables <= 6 # lumber

assert production.solve() == pulp.LpStatusOptimal

print 'Objective value: ', pulp.value(z)
print 'Number of chairs:', pulp.value(chairs)
print 'Number of tables:', pulp.value(tables)


Output:

Objective value:  12.0
Number of chairs: 0.0
Number of tables: 12.0


You can produce three times more products if you assume the lowest possible resource consumption! Nicely done. You have been promoted to Assistant Managing Vice President of Operations! Now let's assume each coefficient follows a uniform discrete distribution of 10 realizations including the both ends of the range and coefficients are independent of each other. The probability of the optimistic case is $p=\frac{1}{10} \times\frac{1}{10} \times\frac{1}{10} \times\frac{1}{10} \times=0.0001$! So as a decision maker, you are setting yourself up for failure. But if you are a "risk-embracing" decision maker, you might hit the jackpot...

Pessimistic Case

Another approach to this problem is to be pessimistic about the process. Being "too careful" about making production plans means that you will assume that the resource consumption coefficients will take their highest possible value.

The resulting production plan is very (overly?) conservative. However no matter what the realization of uncertainty is, there is no chance of infeasibility and consequently having to stop the production line. In order to see what the optimal solution is, we need to replace the values for $a_{ij},\forall i,j \text{ with } \bar a_{ij} + \hat a_{ij},\forall i,j$. The optimization will look like this:

import pulp

production = pulp.LpProblem('Production planning', pulp.LpMaximize)

chairs = pulp.LpVariable('chairs', 0)
tables = pulp.LpVariable('tables', 0)

z = pulp.LpVariable('z', 0)
production += z == chairs + tables
production.setObjective(z)

production += 3.0*chairs + 1.5*tables <= 6 # labor
production += 1.8*chairs + 3.5*tables <= 6 # lumber

assert production.solve() == pulp.LpStatusOptimal

print 'Objective value: ', pulp.value(z)
print 'Number of chairs:', pulp.value(chairs)
print 'Number of tables:', pulp.value(tables)


Output:

Objective value:  2.4615385
Number of chairs: 1.5384615
Number of tables: 0.92307692


This plan produces fewer products (remember we assumed we can produce a fraction of unit). Certainly this method will provide a production plan for you, but you may think it is "too conservative", since you are assuming everything regarding your uncertain parameters will take their worst possible value. This plan will always be feasible and the probability of having to stop the production line due to lack of resources is zero. But your production is also much lower than the nominal case (almost 61.5% of the nominal production level), and your boss may not like this situation either...

You could test all the possible combinations of the uncertain parameters for the production problem, and choose the combination that makes you "happiest." But these coefficients can take any value within the given range, and the number of combinations that they can take on is infinite. In the case of having 10 discrete values, there are total of 10,000 combinations for this simple problem to be checked. For a bigger and more difficult problem this is computationally prohibitive.

The last thing you want to do is keep these guys waiting.

Up Next

This is the part 1 of 2 posts covering covering robust optimization. In the next post, I will explain how robust optimization helps us manage this complicated decision-making issues and go into further depth in some applied applications.

Visiting Data Scientist Program

Saba is the first (of hopefully many) member of the Yhat Visiting Data Scientist program. The program allows for researchers to write about topics they are passionate about and expose thousands of interested readers to research areas they might otherwise never know about.

Reach out to Saba on Twitter or email him at sneyshab [at] gmu.edu

To read more about the program head over to Yhat webpage.

Our Products

Rodeo: a native Python editor built for doing data science on your desktop.

ScienceOps: deploy predictive models in production applications without IT.

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.