3.5 Step-Length Selection Algorithms¶

We now consider techniques for finding a minimum of the one-dimensional function

\[\phi(\alpha) = f(x_k + \alpha p_k)\]

or for simply finding a step length \(\alpha_k\) satisfying one of the termination conditions described in Section 3.1. We assume that \(p_k\) is a descent direction, so that our search can be confined to positive values of \(\alpha\).

If \(f\) is convex quadratic, \(f(x) = \frac{1}{2} x^\top Qx - b^\top x\), its one-dimensional minimizer is given by

\[\alpha_k = -\frac{\nabla f_k^\top p_k}{p_k^\top Qp_k}\]

For general nonlinear functions, it is necessary to use an iterative procedure. The line search procedure deserves particular attention because it has a major impact on the robustness and efficiency of all nonlinear optimization methods.

All line serach procedures require an initial estimate \(\alpha_0\) and generate a sequence \(\{\alpha_i\}\). Typical procedures consist of two phases: a bracketing phase that finds an interval \([\bar{a}, \bar{b}]\) containing acceptable step lengths, and a selection phase that zooms in to locate the final step length.

Interpolation¶

We begin by describing a line search procedure that aims to find a value of \(\alpha\) that satisfies the sufficient decrease condition, without being “too small”.

We can write the sufficient decrease condition as

\[\phi(\alpha_k) \leq \phi(0) + c_1\alpha_k \phi'(0)\]

Suppose that the initial guess \(\alpha_0\) is given. If we have

\[\phi(\alpha_0) \leq \phi(0) + c_1\alpha_0\phi'(0)\]

this step length satisfies the condition, and we terminate the search. Otherwise we know that the interval \([0, \alpha_0]\) contains acceptable step lengths. We form a quadratic approximation \(\phi_q(\alpha)\) to \(\phi\) by interpolating \(\phi(0)\), \(\phi'(0)\), and \(\phi(\alpha_0)\) to obtain

\[\phi_q(\alpha) = \left( \frac{\phi(\alpha_0) - \phi(0) - \alpha_0\phi'(0)}{\alpha_0^2} \right) \alpha^2 + \phi'(0)\alpha + \phi(0)\]

The new trial value \(\alpha_1\) is defined as the minimizer of this quadratic, given by

\[\alpha_1 = - \frac{\phi'(0)\alpha_0^2}{2[\phi(\alpha_0) - \phi(0) - \phi'(0)\alpha_0]}\]

We terminate the search if the sufficient decrease condition is satisfied. Otherwise we construct a cubic function that interpolates the four pieces of information. This process of interpolating with a cubic function is repeated if necessary. If any \(\alpha_i\) is either too close to its predecessor \(\alpha_{i-1}\) or else too smaller than \(\alpha_{i-1}\), we reset \(\alpha_i = \alpha_{i-1}/2\).

This strategy above assumes that derivative values are significantly more expenseive to compute than function values. Cubic interpolation provides a good model for functions with significant changes of curvature, and usually produces a quadratic rate of convergence to the minimizing values of \(\alpha\).

3.5 Step-Length Selection Algorithms¶

Interpolation¶

Initial Step Lengths¶

A Line Search Algorithm for The Wolfe Conditions¶