4.1 Algorithms Based on the Cauchy Point

The Cauchy Point

It is enough for purposes of global convergence to find an approximate solution \(p_k\) that lies within the trust region and gives a sufficient reduction in the model. The sufficient reduction can be quantified in terms of the Cauchy point, denoted by \(p_k^c\).

Algorithm 4.2 (Steihaug’s Algorithm, Cauchy Point Calculation).
Find the vector \(p_k^s\) that solves a linear version of the trust-region subproblem, given by
\(p_k^s = \text{argmin}_{p \in \mathbb{R}^n} f_k + g_k^\top p \;\;\; \text{s.t. } \lVert p \rVert \leq \delta_k\)
Calculate the scalar \(\tau_k > 0\) that minimizes \(m_k(\tau p_k^s)\) subject to satisfying the trust-region bound, that is
\(\tau_k = \text{argmin}_{\tau \geq 0} m_k(\tau p_k^s) \;\;\; \text{s.t. } \lVert \tau p_k^s \rVert \leq \delta_k\)
Set \(p_k^c = \tau_kp_k^s\).

The closed-form solution to the first subproblem is simply

\[p_k^s = - \frac{\delta_k}{\lVert g_k \rVert} g_k\]

By considering \(g_k^\top B_kg_k \leq 0\) and \(g_k^\top B_kg_k > 0\) separately, we have

\[p_k^c = -\tau_k \frac{\delta_k}{\lVert g_k \rVert} g_k\]

where

\[\begin{split}\tau_k = \begin{cases} 1 & \text{if } g_k^\top B_kg_k \leq 0 \\ \min(\lVert g_k \rVert^3/(\delta_k g_k^\top B_kg_k), 1) & \text{otherwise} \end{cases}\end{split}\]

The Cauchy step \(p_k^c\) is inexpensive to calculate and is of crucial importance in deciding if an approximate solution of the trust-region subproblem is acceptable. Specifically, a trust-region method will be globally convergent if its steps \(p_k\) give a reduction in the \(m_k\) that is at least some fixed positive multiple of the decrease attained by the Cauchy step.

Improving on the Cauchy Point

The Cauchy point is simply the steepest descent method with a particular choice of step lengths. Steepest descent methods perform poorly even if an optimal step length is used at each iteration. Now we focus on the trust-region subproblem

\[\min_{p \in \mathbb{R}^n} m(p) = f + g^\top p + \frac{1}{2}p^\top Bp \;\;\; \text{s.t. } \lVert p \rVert \leq \delta\]

and we denote the solution by \(p^*(\delta)\).

The Dogleg method

The dogleg method an be used when \(B\) is positive definite. The unconstrained minimizer of \(m\) is \(p^B = -B^{-1}g\), so we have

\[p^*(\delta) = p^B \;\;\; \text{when } \delta \geq \lVert p^B \rVert\]

When \(delta\) is small relative to \(p^B\), the restriction \(\lVert p \rVert \leq \delta\) ensures that the quadratic term in \(m\) has little effect on the solution, and we can get an approximation to \(p(\delta)\) by omitting the qudaratic term and writing

\[p^*(\delta) \approx - \delta \frac{g}{\lVert g \rVert} \;\;\; \text{when } \delta \text{ is small}\]

For intermediate values of \(\delta\), the solution \(p^*(\delta)\) typically follows a curved trajectory like the one in the figure below.

_images/fig4-3.png

The dogleg method finds an approximate solution by replacing the curved trajectory for \(p^*(\delta)\) with a path consisting of two line segments. The first line segment runs from the origin to the minimizer of \(m\) along the steepest descent direction, which is

\[p^U = - \frac{g^\top g}{g^\top Bg} g\]

while the second line segment runs from \(p^U\) to \(p^B\). We denote this trajectory by \(\tilde{p}(\tau)\) for \(\tau \in [0, 2]\), where

\[\begin{split}\tilde{p}(\tau) = \begin{cases} \tau p^U & 0 \leq \tau \leq 1 \\ p^U + (\tau - 1)(p^B - p^U) \end{cases}\end{split}\]

The dogleg method chooses \(p\) to minimize the model \(m\) along this path.

Lemma 4.2. Let \(B\) be positive definite, then 1. \(\lVert \tilde{p}(\tau) \rVert\) is an increasing function of \(\tau\) 2. \(m(\tilde{p}(\tau))\) is a decreasing function of \(\tau\)

It follows that the chosen value \(p\) will be at \(p^B\) if \(\lVert p^B \rVert \leq \delta\), otherwise at the point of intersection of the dogleg and the trust-region boundary.

When the exact Hessian \(\nabla^2 f(x_k)\) is available, we find the Newton-dogleg step. We conclude that the Newton-dogleg method is most appropriate when the objective function is convex (when \(\nabla^2 f(x_k)\) is always positive semidefinite).

Two-Dimensional Subspace Minimization

The dogleg method can be made slightly more complicated by widening the search for \(p\) to the entire two-dimensional subspace spanned by \(p^U\) and \(p^B\). The subproblem is repalced by

\[\min_p m(p) = f + g^\top p + \frac{1}{2} p^\top Bp \;\;\; \text{s.t. } \lVert p \rVert \leq \delta, p \in \text{span}[g, B^{-1}g]\]

It can be reduced to finding the roots of a fourth degree polynomial. Clearly the Cauchy point \(p^C\) is feasible, resulting in global convergence of the algorithm.