6.2 The SR1 Method

There is in fact a simpler rank-1 update for \(B_k\) that maintains symmetry of the matrix and allows it to satisfy the secant equation. However, this symmetric-rank-1, or SR1, update does not guarantee that the updated matrix maintains positive definiteness.

The symmetric rank-1 update has the general form

\[B_{k+1} = B_k + \sigma vv^\top\]

where \(\sigma = \pm 1\) and \(\sigma\) and \(v\) are chosen so that \(B_{k+1}\) satisfies the secant equation \(y_k = B_{k+1}s_k\). By substituting into this equation, we obtain

\[y_k = B_ks_k + [\sigma v^\top s_k] v\]

Since \(\sigma v^\top s_k\) is a scalar, \(v\) must be a multiple of \(y_k - B_ks_k\). Let \(v = \delta (y_k - B_ks_k)\). It follows that

\[(y_k - B_ks_k) = \sigma \delta^2 \left[ s_k^\top (y_k - B_ks_k) \right](y_k - B_ks_k)\]

This equation is satisfied if and only if we choose

\[\sigma = \text{sign} \left[s_k^\top (y_k - B_ks_k)\right], \;\;\; \delta = \pm \left\lvert s_k^\top (y_k - B_ks_k) \right\rvert^{-1/2}\]

Hence we have shown that the only symmetric rank-1 updating formula that satisfies the secant equation is given by

\[\text{(SR1)} \;\;\; B_{k+1} = B_k + \frac{(y_k - B_ks_k)(y_k - B_ks_k)^\top}{(y_k - B_ks_k)^\top s_k}\]

By applying the Sherman-Morrison formula, we obtain the updating formula for the inverse Hessian approximation \(H_k\):

\[\text{(SR1)} \;\;\; H_{k+1} = H_k + \frac{(s_k - H_ky_K)(s_k - H_ky_k)^\top}{(s_k - H_ky_k)^\top y_k}\]

The main drawback of SR1 updating is that the denominator can vanish. We see that there are three cases:

  1. If \((y_k - B_ks_k)^\top s_k \neq 0\), then there is a unique rank-one updating formula satisfying the secant equation as given above.

  2. If \(y_k = B_ks_k\), then the only updating formulat satisfying the secant equation is \(B_{k+1} = B_k\).

  3. If \(y_k \neq B_ks_k\) and \((y_k - B_ks_k)^\top s_k = 0\), then there is no symmetric rank-one updating formula satisfying the secant equation.

The third cases suggests that rank-one updating does not provide enough freedom to develop a matrix with all the desired characteristics, and that a rank-two correction is required.

Nevertheless, we are interested in the SR1 formula for the following reasons:

  1. A simple safeguard seems to adequately prevent the breakdown of the method and the occurrence of numerical instabilities.

  2. The matrices generated by the SR1 formula tends to be good approximations to the true Hessian matrix – often better than the BFGS approximations.

  3. In quasi-Netwon methods for constrained problems, or in methods for partially separable functions (Chapter 18 and 7), it may not be possible to impose the curvature condition \(y_k^\top s_k > 0\), and thus BFGS updating is not recommended. In these two settings, indefinite Hessian approximations are desirable insofar as they reflect indefiniteness in the true Hessian.

To prevent the SR1 from breaking down, we simply skip the update if the denominator is small. More specifically, the update is applied only if

\[\left\lvert s_k^\top (y_k - B_ks_k) \right\rvert \geq r \lVert s_k \rVert \lVert y_k - B_ks_k \rVert\]

where \(r \in (0, 1)\), say \(r = 10^{-8}\).

Now we give a formal description of an SR1 method using a trust-region framework.

Algorithm 6.2 (SR1 Trust-Region Method).
Given starting point \(x_0\), initial Hessian approximation \(B_0\), trust-region radius \(\delta_0\), convergence tolerance \(\epsilon > 0\), parameters \(\eta \in (0, 10^{-3})\) and \(r \in (0, 1)\);
\(k \leftarrow 0\);
while \(\lVert \nabla f_k \rVert > \epsilon\);
Compute \(s_k\) by solving the subproblem \(\min_s \nabla f_k^\top s + \frac{1}{2}s^\top B_k s\) subject to \(\lVert s \rVert \leq \delta_k\);
Compute \(y_k = \nabla f(x_k + s_k) - \nabla f_k\), \(\text{ared} = f_k - f(x_k + s_k)\), \(\text{pred} = - \left( \nabla f_k^\top s_k + \frac{1}{2}s_k^\top B_ks_k \right)\);
if \(\text{ared}/\text{pred} > \eta\)
\(x_{k+1} = x_k + s_k\);
else
\(x_{k+1} = x_k\);
end if
if \(\text{ared}/\text{pred} > 0.75\)
if \(\lVert s_k \rVert \leq 0.8 \delta_k\)
\(\delta_{k+1} = \delta_k\);
else
\(\delta_{k+1} = 2\delta_k\);
end if
else if \(0.1 \leq \text{ared}/\text{pred} \leq 0.75\)
\(\delta_{k+1} = \delta_k\);
else
\(\delta_{k+1} = 0.5\delta_k\);
end if
if \(\left\lvert s_k^\top (y_k - B_ks_k) \right\rvert \geq r \lVert s_k \rVert \lVert y_k - B_ks_k \rVert\) holds
Compute \(B_{k+1}\) with SR1;
else
\(B_{k+1} = B_k\);
end if
\(k \leftarrow k + 1\);
end while

To obtain a fast rate of convergence, it is important for the matrix \(B_k\) to be udpated even along a failed direction \(s_k\). The fact that the step was poor indicates that \(B_k\) is an adequate approximation of the true Hessian in that direction.

Properties of SR1 updating

SR1 updating can generate good Hessian approximations. We demonstrate this property by first examining a quadratic function. To examine the effects of the updates, we assume a uniform step lenght 1, that is,

\[p_k = -H_k\nabla f_k, \;\;\; x_{k+1} = x_k + p_k\]

It follows that \(p_k = s_k\).

Theorem 6.1. Suppose that \(f: \mathbb{R}^n \to \mathbb{R}\) is the strongly convex quadratic function \(f(x) = b^\top x + \frac{1}{2}x^\top Ax\), where \(A\) is symmetric positive definite. Then for any starting point \(x_0\) and any symmetric starting matrix \(H_0\), the iterates \(\{x_k\}\) generated by the SR1 method converge to the minimizer in at most \(n\) steps, provided that \((s_k - H_ky_k)^\top y_k \neq 0\) for all \(k\). Moreover, if \(n\) steps are performed, and if the search directions \(p_i\) are linearly independent, then \(H_n = A^{-1}\).

For general nonlinear functions, the SR1 update continues to generate good Hessian approximations under certain conditions.

Theorem 6.2. Suppose that \(f\) is twice continuously differentiable, and that its Hessian is bounded and Lipschtiz continuous in a neighborhood of a point \(x^*\). Let \(\{x_k\}\) be any sequence of iterates such that \(x_k \to x^*\) for some \(x^* \in \mathbb{R}^n\). Suppose in addition that the inequality \(\left\lvert s_k^\top (y_k - B_ks_k) \right\rvert \geq r \lVert s_k \rVert \lVert y_k - B_ks_k \rVert\) holds for all \(k\), for some \(r \in (0, 1)\), and that the steps \(s_k\) are uniformly linearly independent. Then the matrices \(B_k\) generated by the SR1 updating formula satisfy

\[\lim_{k \to \infty} \lVert B_k - \nabla^2 f(x^*) \rVert = 0\]