Convex optimization

Matthieu R Bloch

Constrained optimization

  • General form of constrained optimization problem \[\min_{\bfx\in\bbR^d} f(\bfx)\text{ such that }\begin{cases}g_i(\bfx)\leq 0\quad\forall i\in\intseq{1}{m}\\h_j(\bfx)=0\quad\forall j\in\intseq{1}{p}\end{cases}\]
    • \(f\) is called the objective function
    • \(g_i(\bfx)\) is called an inequality constraint
    • \(h_j(\bfx)\) is called an equality constraint
    • if \(\bfx\) satisfies all the constraints, we say that it is feasible.
  • A constrained optimization problem is convex if \(f\) is convex, the \(g_i\)’s are convex, and the \(h_j\)’s are affine

  • Turn constrained optimization into unconstrained one using Lagrangian \[\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\eqdef f(\bfx)+\sum_{i=1}^m\lambda_i g_i(\bfx)+\sum_{j=1}^p\mu_jh_j(\bfx)\text{ with }\boldsymbol{\lambda}\geq \boldsymbol{0}\]
    • \(\boldsymbol{\lambda}\eqdef [\lambda_1,\cdots,\lambda_m]^\intercal\) and \(\boldsymbol{\mu}\eqdef [\mu_1,\cdots,\mu_p]^\intercal\) are dual variables or Lagrange multipliers

Primal and Dual problems

  • The Lagrange dual function is \[\calL_D(\boldsymbol{\lambda},\boldsymbol{\mu})=\min_{\bfx}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\]

  • The dual optimization problem is \[\max_{\boldsymbol{\lambda}\geq \boldsymbol{0},\boldsymbol{\mu}} \calL(\boldsymbol{\lambda},\boldsymbol{\mu}) = \max_{\boldsymbol{\lambda}\geq \boldsymbol{0},\boldsymbol{\mu}}\min_{\bfx}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\].

  • The primal function is \[\calL_P(\bfx)\eqdef \max_{\boldsymbol{\lambda}\geq\boldsymbol{0},\boldsymbol{\mu}}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\]

  • The primal optimization problem is \[\min_{\bfx}\calL_P(\bfx) = \min_{\bfx}\max_{\boldsymbol{\lambda}\geq\boldsymbol{0},\boldsymbol{\mu}}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\]

Primal and Dual problems

  • The (Lagrangian) dual function is concave.

  • Let \(\calF\) denote the set of feasible values of the original problem. Then \(\min_\bfx \calL_P(\bfx) = \min_{\bfx\in\calF}f(\bfx)\).

Weak duality

  • \[d^*\eqdef \max_{\boldsymbol{\lambda}\geq \boldsymbol{0},\boldsymbol{\mu}}\min_{\bfx}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu}) \leq p^*\eqdef \min_{\bfx}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0},\boldsymbol{\mu}}\calL(\bfx,\boldsymbol{\lambda},\boldsymbol{\mu})\]

  • We can solve the dual (concave maximization problem) and obtain a lower bound for the primal

  • The gap \(p^*-d^*\geq 0\) is called the duality gap

  • Strong duality is the situation in which \(p^*-d^*=0\)

  • Question: how are the solutions of the primal and dual problems related?

Karush-Kuhn Tucker conditions

  • Assume \(f\), \(\{g_i\}\), \(\{h_j\}\) are all differentiable
  • Consider \(\bfx, \boldsymbol{\lambda}, \boldsymbol{\mu}\)

  • Stationarity: \[0 = \nabla f(\bfx)+\sum_{i=1}^m\lambda_i\nabla g_i(\bfx)+\sum_{j=1}^p \mu_j\nabla h_j(\bfx)\]

  • Primal feasibility: \[\forall i\in\intseq{1}{m}\quad g_i(\bfx)\leq 0\qquad \forall j\in\intseq{1}{p}\quad h_j(\bfx)=0\]

  • Dual feasibility: \[\forall i\in\intseq{1}{m}\quad \lambda_i\geq 0\]

  • Complementary slackness: \[\forall i\in\intseq{1}{m}\quad \lambda_ig_i(\bfx)= 0\]

KKT conditions: necessity and sufficiency

  • If \(\bfx^*\) and \((\boldsymbol{\lambda}^*,\boldsymbol{\mu}^*)\) are primal and dual solutions with zero duality gap, then \(\bfx^*\) and \((\boldsymbol{\lambda}^*,\boldsymbol{\mu}^*)\) satisfy the KKT conditions.

  • If the original problem is convex and \(\tilde{\bfx}\) and \((\tilde{\boldsymbol{\lambda}},\tilde{\boldsymbol{\mu}})\) satisfy the KKT conditions, then \(\tilde{\bfx}\) is primal optimal, \((\tilde{\boldsymbol{\lambda}},\tilde{\boldsymbol{\mu}})\) is dual optimal, and the duality gap is zero.

  • If the primal problem is convex, we usually have strong duality
    • Qualification constraints are required: conditions beyond convexity
    • Example: Slater’s condition requires a strictly feasible point \[ \exists \bfx\in\text{relint}(\calD)\text{ s.t. } \forall i\quad f_i(\bfx)<0\qquad \forall j\quad h_j(\bfx)=0. \]
  • If a constrained optimization problem is differentiable, convex and satisfies Slater’s condition
    • KKT conditions are necessary and sufficient for primal/dual optimality (with zero duality gap)
    • we can use the KKT conditions to find a solution to our optimization problem
  • We’re in luck: the optimal soft-margin hyperplane falls in this category!