A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

  We develop R2N, a modified quasi-Newton method for minimizing the sum of a
\(\mathcal{C}^1\) function \(f\) and a lower semi-continuous prox-bounded \(h\).
  Both \(f\) and \(h\) may be nonconvex.
  At each iteration, our method computes a step by minimizing the sum of a
quadratic model of \(f\), a model of \(h\), and an adaptive quadratic
regularization term.
  A step may be computed by way of a variant of the proximal-gradient method.
  An advantage of R2N over competing trust-region methods is that proximal
operators do not involve an extra trust-region indicator.
  We also develop the variant R2DH, in which the model Hessian is diagonal,
which allows us to compute a step without relying on a subproblem solver when
\(h\) is separable.
  R2DH can be used as standalone solver, but also as subproblem solver inside
R2N\@.
  We describe non-monotone variants of both R2N and R2DH\@.
  Global convergence of a first-order stationarity measure to zero holds without
relying on local Lipschitz continuity of \(\nabla f\), while allowing model
Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton
models.
  Under Lipschitz-continuity of \(\nabla f\), we establish a tight worst-case
evaluation complexity bound of \(O(1 / \epsilon^{2/(1 - p)})\) to bring said
measure below \(\epsilon > 0\), where \(0 \leq p < 1\) controls the growth of
model Hessians.
  Specifically, the latter must not diverge faster than \(|\mathcal{S}_k|^p\),
where \(\mathcal{S}_k\) is the set of successful iterations up to iteration
\(k\).
  When \(p = 1\), we establish the tight exponential complexity bound \(O(\exp(c
\epsilon^{-2}))\) where \(c > 0\) is a constant.
  We describe our Julia implementation and report numerical experience on a
classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix
completion problem, and a nonlinear support vector machine.
  In particular, the minimum-rank problem cannot be solved directly at this time
by a trust-region approach as corresponding proximal operators are not known
analytically.