IPM Part 2: The Lee-Sidford Barrier

These notes are based on my semester research project in the Theoretical Computer Science Lab at EPFL, where I was supervised by Prof. Kapralov and Kshiteej Sheth.
To understand the context in which the objects presented here are used, read my notes on Interior Point Methods.

1. Warm-Up: The Logarithmic Barrier

One simple barrier that one can consider for linear constraints is the logarithmic barrier :

Definition 1: Logarithmic barrier - Let the $n$ -dimentional polytope with $m$ constraints $K = {A x \geq b} = \cap_{i \in [m]} {a_{i}^{⊤} x \geq b_{i}}$ . The logarithmic barrier is defined as

$ϕ (x) = - i = 1 \sum m ln (a_{i}^{⊤} x - b_{i})$

To study its self-concordance, we start by proving the following lemma

Lemma 1: If $ϕ_{1}$ and $ϕ_{2}$ are $ν_{1}$ - and $ν_{2}$ -self-concordant barriers on $K_{1}$ and $K_{2}$ respectively, then $ϕ_{1} + ϕ_{2}$ is a $ν_{1} + ν_{2}$ -self-concordant barrier on $K_{1} \cap K_{2}$ .

Proof. The two properties of $ν$ -self-concordance (Definition 4) are validated by just combining the inequalities of self-concordance of $ϕ_{1}$ and $ϕ_{2}$ (and using triangle inequality for the second property).

Lemma 2: The Logarithmic barrier $ϕ (x) = - \sum_{i = 1}^{m} ln (a_{i}^{⊤} x - b_{i})$ is $m$ -self-concordant.

Proof. Take any $i \in [m]$ and $ϕ_{i} (x) = - ln (a_{i}^{⊤} x - b_{i})$ . Let $s_{i} = a_{i}^{⊤} x - b_{i}$ .

$D ϕ_{i} (x) [v] = ⟨ \nabla ϕ_{i} (x), v ⟩ = \nabla ϕ_{i} (x)^{⊤} v = - \frac{a _{i}^{⊤} v}{s _{i}}$

$D^{2} ϕ_{i} (x) [v, v] = ⟨ \nabla D ϕ_{i} (x) [v], v ⟩ = \frac{a _{i}^{⊤} v}{s _{i}^{2}} a_{i}^{⊤} v = (\frac{a _{i}^{⊤} v}{s _{i}})^{2}$

$D^{3} ϕ_{i} (x) [v, v, v] = ⟨ \nabla D^{2} ϕ_{i} (x) [v, v], v ⟩ = - 2 \frac{( a _{i}^{⊤} v ) ^{2}}{s _{i}^{3}} a_{i}^{⊤} v = - 2 (\frac{a _{i}^{⊤} v}{s _{i}})^{3} \leq 2 (\frac{a _{i}^{⊤} v}{s _{i}})^{2 \frac{3}{2}} = 2 (D^{2} ϕ_{i} (x) [v, v])^{\frac{3}{2}}$

and we just proved the first property of self-concordance for $ϕ_{i} (x)$ .
Now,

$\nabla ϕ_{i} (x) = - \frac{a _{i}}{s _{i}} \nabla^{2} ϕ_{i} (x) = \frac{a _{i} a _{i}^{⊤}}{s _{i}^{2}}$

$∥ \nabla ϕ_{i} (x)∥_{\nabla^{2} ϕ_{i} (x)^{- 1}}^{2} = ∥ \nabla^{2} ϕ_{i} (x)^{- 1} \nabla ϕ_{i} (x)∥_{\nabla^{2} ϕ_{i} (x)}^{2} = \nabla ϕ_{i} (x)^{⊤} \nabla^{2} ϕ_{i} (x)^{- 1} \nabla ϕ_{i} (x) = \frac{a _{i}^{⊤}}{s _{i}} s_{i}^{2} (a_{i} a_{i}^{⊤})^{- 1} \frac{a _{i}}{s _{i}} = a_{i}^{⊤} (a_{i} a_{i}^{⊤})^{- 1} a_{i} = 1$

Thus, $ϕ_{i} (x)$ is 1-self-concordant, for any $i \in [m]$ .

Since $ϕ (x) = \sum_{i = 1}^{m} ϕ_{i} (x)$ and by Lemma 1., $ϕ (x)$ is $m$ -self-concordant.

From our IPM framework, we conclude that using the logarithmic barrier yields an $O (m)$ iteration algorithm for solving Linear Programs. The main problem with this is that since $m$ can get exponential in $n$ , this can get very inefficient, for example when the LP has a lot of repeated (or very similar) constraints. The next logical step is to try and reweigh the constraints, and give less weight to the ones that are repeated.

2. The Weighted Logarithmic Barrier

We are looking for a barrier that behaves like $- \sum_{i = 1}^{m} w_{i} ln (a_{i}^{⊤} x - b_{i})$ for $w_{i}$ to be determined later. We thus define $ψ (x) = - \sum_{i = 1}^{m} w_{i} ln (a_{i}^{⊤} x - b_{i})$ and we will study its self concordance.

Lemma 3: We have

$\nabla ψ (x)^{⊤} (\nabla^{2} ψ (x))^{- 1} \nabla ψ (x) \leq i \sum w_{i}$

And

$D^{3} ψ (x) [h, h, h] \leq 2 i max \frac{σ _{i} ( W A _{x} )}{w _{i}} (D^{2} ψ (x) [h, h])^{\frac{3}{2}}$

where $σ_{i} (W A_{x})$ is the leverage score of $A_{x} W$ , namely $σ_{i} (W A_{x}) = (W A_{x} (A_{x}^{⊤} W A_{x})^{- 1} A_{x}^{⊤} W)_{i, i}$

Proof. We have that $\nabla ψ (x) = A_{x}^{⊤} w$ and $\nabla^{2} ψ (x) = A_{x}^{⊤} W A_{x}$ . Hence,

$\nabla ψ (x)^{⊤} (\nabla^{2} ψ (x))^{- 1} \nabla ψ (x) = w^{⊤} (W A_{x} (A_{x}^{⊤} W A_{x})^{- 1} A_{x}^{⊤} W) w$

But $W A_{x} (A_{x}^{⊤} W A_{x})^{- 1} A_{x}^{⊤} W$ is an orthogonal projection matrix, so its operator norm is smaller than $1$ . We can then conclude that

$\nabla ψ (x)^{⊤} (\nabla^{2} ψ (x))^{- 1} \nabla ψ (x) \leq w^{⊤} w \leq i \sum w_{i}$

For the second part of the proof, let $s_{i} = a_{i}^{⊤} x - b_{i}$ be the $i$ th slack condition. We have

$D^{3} ψ (x) [h, h, h] = - 2 i \sum w_{i} (\frac{a _{i}^{⊤} h}{s _{i}})^{3} \leq 2 i max \frac{a _{i}^{⊤} h}{s _{i}} i \sum w_{i} (\frac{a _{i}^{⊤} h}{s _{i}})^{2}$

Furthermore,

$\frac{a _{i}^{⊤} h}{s _{i}} = \frac{a _{i}^{⊤} ( A _{x}^{⊤} W A _{x} ) ^{- \frac{1}{2}} ( A _{x}^{⊤} W A _{x} ) ^{\frac{1}{2}} h}{s _{i}} \leq (A_{x} (A_{x}^{⊤} W A_{x})^{- 1} A_{x}^{⊤})_{i, i} ∥ h ∥_{\nabla^{2} ψ (x)}$

Hence, we have

$D^{3} ψ (x) [h, h, h] \leq 2 i max \frac{σ _{i} ( W A _{x} )}{w _{i}} (D^{2} ψ (x) [h, h])^{\frac{3}{2}}$

Rescaling $ψ$ by a constant, we get that

$D^{3} ψ (x) [h, h, h] \leq 2 i \sum w_{i} i max \frac{σ _{i} ( W A _{x} )}{w _{i}} (D^{2} ψ (x) [h, h])^{\frac{3}{2}}$

Which gives us a self-concordance factor of $\sum_{i} w_{i} max_{i} \frac{σ _{i} ( W A _{x} )}{w _{i}}$ . It depends on $x$ , but if $w_{i} = σ_{i} (W A_{x})$ , note that we have that $\sum_{i} w_{i} max_{i} \frac{σ _{i} ( W A _{x} )}{w _{i}} = \sum_{i} σ_{i} (W A_{x}) \leq n$ because of the properties of leverage scores.

Hence with this intuition, we can see how one can try to look for a $O (n)$ self-concordant barrier. The next section will present the actual barrier found by Lee and Sidford, and give some details about it.

3. The Lee-Sidford Barrier

As seen in the previous section, we want to weight the $i$ th constraint with a weight $w_{i}$ such that $w_{i} = σ_{i} (W A_{x})$ . This corresponds exactly to the definition of the $ℓ_{\infty}$ Lewis weights. By their recursive nature, it's very hard to compute them exactly, so one can try to relax those condition to taking $w$ to be the vector of the $ℓ_{p}$ Lewis weights, with $p$ large enough.

Lee and Sidford introduced the following barrier :

$ϕ_{p} (x) = w_{i} \geq 0 max lo g det (A_{x}^{⊤} W^{1 - \frac{2}{p}} A_{x}) - (1 - \frac{2}{q}) i = 1 \sum m w_{i}$

And proved the following result :

Theorem 1: If $A$ has full rank, then for any $p > 0$ , $ϕ_{p}$ is a $O (n m^{\frac{1}{p + 1}})$ self concordant barrier. In particular, for $p = Θ (lo g m), ϕ_{p}$ is a $O (n lo g^{O (1)} m)$ self concordant barrier.

The proof is lengthy and can be found in Reference 1, but one insight that might be interesting is the following (proof omitted):

Lemma 4: We have

$A_{x}^{⊤} Σ_{p} (x) A_{x} ⪯ \nabla^{2} ϕ_{p} (x) ⪯ (1 + p) A_{x}^{⊤} Σ_{p} (x) A_{x}$

Where $Σ_{p} (x)_{i, i}$ is the $i$ th $ℓ_{p}$ Lewis weight of $A_{x}$ .

This lemma is insightful because it shows that the hessian of $ϕ_{p}$ is spectrally close to $A_{x}^{⊤} Σ_{p} (x) A_{x}$ , which is the Hessian of the $ℓ_{p}$ Lewis weights reweighted logarithmic barrier, and this can justify why the Lee-Sidford barrier behaves as wanted.

4. Putting it all together

We have exhibited a $\tilde{O} (n)$ self-concordant barrier, which can be used to get a $\tilde{O} (n lo g (1/ ϵ)$ iteration algorithm to solve Linear Programs. One important thing to study now is the per iteration cost, in particular the cost to compute the barrier. The Lee-Sidford is very costly to compute because it involves determinants, and $ℓ_{p}$ Lewis weights. Not being careful about how one computes it might ruin all our previous efforts.

The idea is to compute the barrier iteratively: the broad picture is that from $ϕ_{p} (x)$ , $x$ , and $x^{'}$ , if $x$ and $x^{'}$ are close enough, one can estimate $ϕ_{p} (x^{'}) - ϕ_{p} (x)$ with satisfactory precision, because $ϕ_{p}$ is smooth enough. Furthermore, computing $ϕ_{p} (x^{'})$ from $ϕ_{p} (x)$ only takes a logarithmic number of linear systems to solve. We can now state the final result of Lee and Sidford :

Theorem 2: Given an interior point $x_{0}$ of a Linear Program, there exists an algorithm that outputs a feasible point $x$ such that $c^{⊤} x \leq OPT + ϵ$ with constant probability in $\tilde{O} (n lo g (1/ ϵ) T_{w})$ time, where $T_{w}$ is the work needed to compute $(A^{⊤} D A)^{- 1} q$ for a positive diagonal matrix $D$ and a vector $q$ .

Depending on the nature of the Linear Program, $T_{w}$ might vary, and as such, the most general result with state of the art techniques states that one can solve approximately a Linear Program in time $\tilde{O} ((nn z (A) + r ank (A)^{ω}) n lo g (1/ ϵ))$ where $ω$ is the constant multiplication matrix.

References

Y. T. Lee, A. Sidford - Solving Linear Programs with sqrt(rank) linear system solves

# IPM Part 2: The Lee-Sidford Barrier

# 1. Warm-Up: The Logarithmic Barrier

# 2. The Weighted Logarithmic Barrier

# 3. The Lee-Sidford Barrier