3.5 Step-Length Selection Algorithms¶
We now consider techniques for finding a minimum of the one-dimensional function
or for simply finding a step length \(\alpha_k\) satisfying one of the termination conditions described in Section 3.1. We assume that \(p_k\) is a descent direction, so that our search can be confined to positive values of \(\alpha\).
If \(f\) is convex quadratic, \(f(x) = \frac{1}{2} x^\top Qx - b^\top x\), its one-dimensional minimizer is given by
For general nonlinear functions, it is necessary to use an iterative procedure. The line search procedure deserves particular attention because it has a major impact on the robustness and efficiency of all nonlinear optimization methods.
All line serach procedures require an initial estimate \(\alpha_0\) and generate a sequence \(\{\alpha_i\}\). Typical procedures consist of two phases: a bracketing phase that finds an interval \([\bar{a}, \bar{b}]\) containing acceptable step lengths, and a selection phase that zooms in to locate the final step length.
Interpolation¶
We begin by describing a line search procedure that aims to find a value of \(\alpha\) that satisfies the sufficient decrease condition, without being “too small”.
We can write the sufficient decrease condition as
Suppose that the initial guess \(\alpha_0\) is given. If we have
this step length satisfies the condition, and we terminate the search. Otherwise we know that the interval \([0, \alpha_0]\) contains acceptable step lengths. We form a quadratic approximation \(\phi_q(\alpha)\) to \(\phi\) by interpolating \(\phi(0)\), \(\phi'(0)\), and \(\phi(\alpha_0)\) to obtain
The new trial value \(\alpha_1\) is defined as the minimizer of this quadratic, given by
We terminate the search if the sufficient decrease condition is satisfied. Otherwise we construct a cubic function that interpolates the four pieces of information. This process of interpolating with a cubic function is repeated if necessary. If any \(\alpha_i\) is either too close to its predecessor \(\alpha_{i-1}\) or else too smaller than \(\alpha_{i-1}\), we reset \(\alpha_i = \alpha_{i-1}/2\).
This strategy above assumes that derivative values are significantly more expenseive to compute than function values. Cubic interpolation provides a good model for functions with significant changes of curvature, and usually produces a quadratic rate of convergence to the minimizing values of \(\alpha\).