4 Trust-Region Methods¶

Trust-region methods define a region around the current iterate within which they trust the model to be an adquate representation of the objective function, and then choose the step to be the approximate minimizer of the model in this region.

The figure below illustrates the trust-region approach on a function \(f\) of two variables in which the current \(x_k\) and the minimizer \(x^*\) lie at opposite ends of a curved valley.

In this chapter, we will assume that the model function \(m_k\) is quadratic. Moreover, \(m_k\) is based on the Taylor-series expansion of \(f\) around \(x_k\), which is

\[f(x_k + p) = f_k + g_k^\top p + \frac{1}{2}p^\top \nabla^2 f(x_k + tp)p\]

where \(g_k = \nabla f(x_k)\) and \(t \in (0, 1)\). By using an approximation \(B_k\) to the Hessian in the second order term, \(m_k\) is defined as

\[m_k(p) = f_k + g_k^\top p + \frac{1}{2} p^\top B_k p\]

where \(B_k\) is some symmetric matrix. We emphasize the generality of the trust-region approach by assuming little about \(B_k\) except symmetry and uniform boundedness.

To obtain each step, we seek a solution of the subproblem

\[\min_{p \in \mathbb{R}^n} m_k(p) = f_k + g_k^\top p + \frac{1}{2} p^\top B_k p \;\;\; \text{s.t. } \lVert p \rVert \leq \delta_k\]

where \(\delta_k > 0\) is the trust-region radius. In any case, as described below, we need only an approximate solution to obtain convergence and good practical behavior.

We choose the trust-region radius \(m_k\) based on the agreement between the model function \(m_k\) and the object function \(f\) at previous iterations. Given a step \(p_k\) we define the ratio

\[\rho_k = \frac{f(x_k) - f(x_k + p_k)}{m_k(0) - m_k(p_k)}\]

the numerator is called the actual reduction and the denominator is the predicted reduction. If \(\rho_k\) is close to zero or negative, we shrink the trust region by reducing \(\delta_k\) at the next iteration.

Algorithm 4.1 (Trust Region).

Given \(\hat{\delta} > 0\), \(\delta_0 \in (0, \hat{\delta})\), and \(\eta \in \left[0, \frac{1}{4}\right)\):

for \(k = 0, 1, 2, \dots\)

Obtain \(p_k\) by (approximately) solving \(\min_{p \in \mathbb{R}^n} m_k(p) = f_k + g_k^\top p + \frac{1}{2} p^\top B_k p \;\;\; \text{s.t. } \lVert p \rVert \leq \delta_k\);

Evaluate \(\rho_k\);

if \(\rho_k < \frac{1}{4}\)

\(\delta_{k+1} = \frac{1}{4}\delta_k\)

else

if \(\rho_k > \frac{3}{4}\) and \(\lVert \rho_k \rVert = \delta_k\)

\(\delta_{k+1} = \min(2\delta_k, \hat{\delta})\)

else

\(\delta_{k+1} = \delta_k\);

if \(\rho_k > \eta\)

\(x_{k+1} = x_k + p_k\)

else

\(x_{k+1} = x_k\);

end for.

Here \(\hat{\delta}\) is an overall bound on the step lengths. We restate the trust-region subporblem as

\[\min_{p \in \mathbb{R}^n} m(p) \stackrel{\text{def}}{=} f + g^\top p + \frac{1}{2} p^\top Bp \;\;\; \text{s.t. } \lVert p \rVert \leq \delta\]

Theorem 4.1. The vector \(p^*\) is a global solution of the trust-region problem

\[\min_{p \in \mathbb{R}^n} m(p) = f + g^\top p + \frac{1}{2}p^\top Bp \;\;\; \text{s.t. } \lVert p \rVert \leq \delta\]

if and only if \(p^*\) is feasible and there is a scalar \(\lambda \geq 0\) such that the following conditions are satisfied:

\[\begin{split}(B + \lambda I)p^* & = -g \\ \lambda (\delta - \lVert p^* \rVert) & = 0 \\ (B + \lambda I) & \;\;\; \text{is positive semidefinite}\end{split}\]

The figure below shows the properties of this theorem with different \(\delta\). Note that

\[\lambda p^* = - Bp^* - g = - \nabla m (p^*)\]

Thus, when \(\lambda > 0\), the solution \(p^*\) is collinear with the negative gradient of \(m\) and normal to its contours.