Interior Point Methods for Solving Linear Programs

1. Introduction & Problem Statement

Interior-Point Methods are approximation algorithms that solve linear programs. Seen as a framework, it is until now the fastest known algorithm for solving the famous maximum-flow problem on graphs, and although it has many numerical stability problems in practice, it's the current best framework for optimizing (with high accuracy) general convex functions in both theory and practice.

These notes first present the general framework of Interior-Point Methods and how it works, and then the concept of barriers is formalized to present the Lee-Sidford barrier, which is used to achieve runtime improvements. The notes will focus on understanding the theoretical concepts of the method rather than focusing on the details of the runtime. The concepts are in my opinion super interesting and very smart!

Given primal variable , objective with and constraints with and , our goal is to solve

approximately up to -error. This is our contrained optimization problem, and the constraints define a convex subset of which we call the polytope.

2. Framework Intuition

Unlike other methods for solving linear programs like the Simplex Method that stay on the border of the polytope, in the Interior-Point method we want to start inside the polytope and away from the border, and slowly move towards the border to always work with a feasible point.

2.1 How We Do It

To stay away from the border we massage our constrained optimization problem into an unconstrained one. We introduce a convex function that we will add in our objective, such that as we get closer to the border, the value of the function explodes. This enables to penalize when we arrive near the border and we don't go out of the polytope. Such a function is called a barrier, and one simple example of such a function is the log barrier , where are the columns of . We can now add to our objective function the term , where is a scalar. ensures that we stay in the feasible region, and acts as a regularization term such that when is small, we are close enough to the original problem that we can get a satisfactory approximation.
The modified problem statement is now

and we will look at solving this (form of) problem now. Note that this is a set of problems, because the problem changes as changes. The set of optimum points of is called the central path. And as said before, intuitively, when , where is the optimum of . So the idea of the algorithm is that we will follow the central path going in the direction of (thus decreasing ).

The algorithm proceeds iteratively as follows:

  • Initialization: start at , find an initial feasible point "close enough" to .
  • At each iteration, move "close enough" to for current , and then decrease multiplicatively: ,
  • return once (achieved -error)

How we measure "closeness" will be defined more formally later. Achieving -error is dependent on how small is.

From this general intuition, two main questions arise:

  • How close should be to at each iteration ? THis will control the per-iteration cost.
  • How much can we decrease at each iteration ? This will control the number of iterations.

2.2 Some More Intuition

Before going to the next section, I give more detail to the above two questions. Theorems and lemmas defined later will clarify this part, so if things don't get clearer while reading, you can continue to the next parts.
There's a region that has to lie into to be able to move closer to . If is not in this region, it might be impossible (or too costly) to later move it closer to . So when changing to , we change the target to and thus we change the region we're looking at: might be in the region of but it might not be in the region of . That's why we first move closer to , and because of geometric dependecies between and , now is also close enough so that it is in the region of . Also, if we decrease too fast ( too high) then we might pay too much in moving closer to , because the region of is even farther away.
These are subtleties that will be handled by lemmas in the more formal sections. But for now here are the two more detailed questions:

  • At each iteration, how close should be to to lie inside the region of ? This will control the per-iteratin cost.
  • How much can we decrease at each iteration ? This will control the number of iterations.

Finally, recall that we always want to be close to because as we decrease , aproaches .

Central Path Following

The above image illustrates the process of following the central path. The dotted circles are the regions of and . The per-iteration cost is represented by the grey arrows.

3. General Framework

As mentioned before, we start by softening

where is a subset of into

where and is a convex function such that as . We call a barrier function for .

Definition 1: The central path is defined as

The framework is as follows:

  1. Find a feasible close to
  2. While is not small enough
    • Move closer to
    • ,

The goal of IPMs is to start with a point in the central path, and following it until we get a good approximation of the original problem. We do not present how to get the initial point, but there exists multiple methods for getting an initial point efficiently, one of which can be found in Reference 2.
Let's now show that this framework works!

3.1 Newton Step and Distance Measure

To move closer to , we are using Newton steps:

Since Newton's method uses the Hessian and performs better when changes in doesn’t change too much the hessian, we are motivated to make the assumption that the Hessian is Lipschitz (smoothness property). This is why we introduce the definition of self-concordance in the next subsection.
Note that the norm of is called the step size.

To study the performance of a Newton’s iteration, we need to define how we measure the distance of to . Since the smaller the step size, the closer we are to the optimum (because small), we define the distance in terms of the size of the step in the Hessian norm.

Definition 2: Given a positive-definite matrix , define . The distance of to is defined as follows

The inverse hessian norm is used to account for the amount of Newton steps needed for to converge to . This makes sense algorithmically because the algorithm is faster if few Newton steps are made.
Note that is also (this can be derived by simple calculation). We will mainly use this form instead of the one given in the definiton, but I really wanted to point out that the definition comes from a motivation to analyze Newton steps.
Also note that .

3.2 Self-Concordance: Enabling Convergence

Since we're using Newton steps, remember the convenient assumption that the Hessian should be Lipschitz. Let's define more formally what is the smoothness property we want.

Definition 3: A convex function is said to be self-concordant if , , we have

D is the directional derivative of in directions .

Integrating the local inequality that we have from the definition, we get the following inequality, which says that locally the Hessian of a self-concordant function doesn't change too fast:

Lemma 1: Let be a self-concordant function. Then , and s.t. ,

Proof. See Appendix (Self-concordance is necessary to prove this lemma)

When we change , we want the new distance from to be smaller in order to make progress. This is what self-concordance enables. It says that locally, hessian doesn't change too much, so it gives insurance that our Newton step won't diverge or make jumps and stop progressing.
The following lemma uses this, and says that the self-concordance condition on guarantees convergence when we try to move closer to , provided that was already close enough. This lemma is one of the main tools that enables the algorithm to proceed iteratively and converge towards the optimum.

Lemma 2: Let be a self-concordant barrier function. Suppose that . Then, after one Newton step of the form , we have

And moving using the Newton method results in moving closer to .

Proof. See Appendix (Lemma 1. is necessary to prove this lemma)

3.3 Following the Central Path

Now, changing must be done carefully, because we want to be as big as possible (to minimize the number of iterations), while still maintaining our invariant of Lemma 2. The following lemma gives an upper-bound on the amount that can change as we change , and suggests a maximum amount by which can change up to the point that our premise of Lemma 2. still holds in the next iteration.

Lemma 3: Let be a self-concordant barrier function. is updated following . Then we have that

Proof. See Appendix

We can now know how big can be. For this, let's make our framework more quantitative:


  1. Find a feasible such that
  2. While is not small enough:
    • (this decreases to )

Assume that . Then, according to Lemma 2., after a single Newton step, . To maintain for the next iteration, we want

So we need . Taking to be equal to this value maximizes the change of .
Notice that the amount by which we change depends on . The lower this value, the fewer number of iterations we will have to do. We can now derive the second property of self-concordance, and we can now fully define what is -self-concordance.

Definition 4: Let be a convex subset of . We say that is a -self-concordant barrier on if

  • for all and (Definition 3.)

  • for all

3.4 Termination Condition

Finally, it remains to determine how small needs to be to achieve -error. One can upper bound the error by using the duality gap of the problem.

Lemma 4: Let be a -self-concordant barrier on , and let be the optimum of . We have that

Proof. See Appendix

Our is a constant number of iterations close to , so to have there is only needed a constant number more iterations. Hence, to get an -error we can set . We can now give the final version of our framework.

Final IPM framework

  1. Find a feasible such that
  2. While :
    • (by Lemma 2., )
    • ,

With this framework complete we can now state a main IPM theorem regarding performance.

3.5 Main IPM Theorem

Theorem 1: Let be the polytope defined by the constraints of . Given a -self-concordant barrier on , we can solve up to -error in iterations.

Proof. We need to show that we reach -error in Newton steps and updates of .
We start at , and at each iteration we multiply by a factor . Let be the number of iterations. We want

Solving this gives .

Discussion

The performance of the IPM framework we've given depends on , the self-concordance of the barrier function we choose. This sort of modularizes the research on the subject and enables to improve the performance by concentrating on finding a better (smaller ) barrier function.
This is the topic of another note, which concentrates on the Lee-Sidford barrier, the current best barrier function found that is computable.

References

  1. Y. T. Lee, A. Sidford - Solving Linear Programs with sqrt(rank) linear system solves
  2. Lecture notes from the famous Yin Tat Lee: CSE 599: Interplay between Convex Optimization and Geometry

Appendix

Proof Lemma 1.

Let with . Then we have that . By self-concordance we have

For , we have . Hence we have . Integrating both on we have

Rearranging it gives

For general , we can obtain

Rearranging gives

Integrating from to gives the result.


Proof Lemma 2.

Lemma 1. shows that

and hence

To bound , we calculate that

For the first term in the bracket, we use Lemma 1. to get that

Therefore we have

Putting it into (**) gives

Using this inequality with (*) gives the result.


Proof Lemma 3.

Taking the norm, using triangle inequality and using that and , we get


Proof Lemma 4.

We state a first lemma that will help us in the proof.

Lemma: Let be a -self-concordant barrier, and let be the polytope defined by the constraints of . Then, for any

Proof. Let where . Then

We also have that

Combining the two, we get

  • If , we are done because and our inequality is satisfied.
  • If , then is increasing in , because is convex (and look at the monotonicity of depending on the sign of ). Thus .
    Furthermore, integrating the inequality , we have that . Thus, and which gives the target inequality.

Now we can use this lemma to prove Lemma 4.
At the optimal of we have . Therefore we have

which completes the proof.

Last Updated: