3.2 Convergence of Line Search Methods

We disucss requirements on the search direction in this section, focusing on the angle \(\theta_k\) between \(p_k\) and the steepest descent direction \(-\nabla f_k\), defined by

\[\cos \theta_k = \frac{-\nabla f_k^\top p_k}{\lVert \nabla f_k \rVert \lVert p_k \rVert}\]

The following theorem, due to Zoutendijk, quantifies the effort of properly chosen step lengths \(\alpha_k\), and shows that the steepest descent method is globally convergent. For other algorithms, it describes how far \(p_k\) can deviate from the steepest descent direction and still produce a globally convergent iteration.

Theorem 3.2. Consider any iteration of the form

\[x_{k+1} = x_k + \alpha_kp_k\]

where \(p_k\) is a descent direction and \(\alpha_k\) satisfies the Wolfe conditions. Suppose that \(f\) is bounded below in \(\mathbb{R}\) and that \(f\) is continuously differentiable in an open set \(\mathcal{N}\) containing the level set \(\mathcal{L} \stackrel{\text{def}}{=} \{x: f(x) \leq f(x_0)\}\), where \(x_0\) is the starting point of the iteration. Assuming also that the gradient \(\nabla f\) is Lipshitz continuous on \(\mathcal{N}\), that is, there exists a constant \(L > 0\) such that

\[\lVert \nabla f(x) - \nabla f(\tilde{x}) \leq L \lVert x - \tilde{x} \rVert, \;\;\; \text{for all } x, \tilde{x} \in \mathcal{N}\]

Then

\[\sum_{k \geq 0}^\infty \cos^2 \theta_k \lVert \nabla f_k \rVert^2 < \infty\]

Similar results to this theorem holds when the Goldstein conditions or strong Wolfe conditions are used. For all these strategies, the step length selection implies this inequality, which we call the Zoutendijk condition.

The Zoutendijk condition implies that

\[\cos^2 \theta_k \lVert \nabla f_k \rVert^2 \to 0\]

Warning

How to show this?

This limit can be used to derive global convergence results for line serach algortihms.

If the angle \(\theta_k\) is bounded away from \(\pi/2\), there is a positive constant \(\delta\) such that

\[\cos \theta_k \geq \delta > 0, \;\;\; \text{for all } k\]

It follows immediately that

\[\lim_{k \to \infty} \lVert \nabla f_k \rVert = 0\]

We use the term globally convergent to refer to algorithms for which this property is satisfied. For line serach methods of the general form, this limit is the strongest global convergence result that can be obtained. Only by making additional requirements on the search direction \(p_k\) – by introducing negative curvature information from the Hessian \(\nabla^2 f(x_k)\) – can we strengthen these results to include convergence to a local minimum.

Consider now the Newton-like method and assume that the matrices \(\mathbf{B}_k\) are positive definite with a uniformly bounded condition number. That is, there is a constant \(M\) such that

\[\lVert \mathbf{B}_k \rVert \lVert \mathbf{B}_k^{-1} \rVert \leq M, \;\;\; \text{for all } k\]

It follows that

\[\cos \theta_k \geq 1/M\]

Thus we have

\[\lim_{k \to \infty} \lVert \nabla f_k \rVert = 0\]

Therefore, we have shown that Newton and quasi-Newton methods are globally convergent if the matrices \(\mathbf{B}_k\) have a bounded condition number and are positive definite, and if the step lengths satisfy the Wolfe conditions.

For some algorithms, such as conjugate gradient methods, we will be able to prove the limit, but only the weaker result

\[\lim_{k \to \infty} \inf \lVert \nabla f_k \rVert = 0\]

In other words, just a subsequence of the gradient norms \(\lVert \nabla f_{k_j} \rVert\) converges to zero. This can be proved using Zoutendijk’s condition by contradiction. Therefore, for any algorithm that has a steepest descent step every \(m\) th iteration, we can prove global convergence.