## Preview text

KELLER VANDEBOGERT
1. Introduction Interpolation methods are a common approach to the more general area of line search for optimization. In the case of quadratic interpolation, the function’s critical value is bracketed, and a quadratic interpolant is ﬁtted to the arc contained in the interval. Then, the interpolant is minimized, and the new interval is determined based on the relation of the minimizer to the original endpoints of the interval. Put more formally, let x* maximize (or minimize) f (x). If x* is not easily found through analytic methods, then it is signiﬁcantly easier to bracket the interval over which this critical point occurs. Let q(x) denote the quadratic interpolant of f (x). Minimizing a quadratic function is trivial, and so the critical point of q is easily obtained. We then form a new bracketing interval by throwing away the ”worst” point, which for our purposes would be the point that is the largest or smallest, depending on whether we want to approximate a maximum or minimum. The process is then iterated until a desired precision is reached.
Date: September 3, 2017. 1

2

KELLER VANDEBOGERT

2. Methods of Interpolation

In practice there are 3 methods of interpolation. There are 2 types of 2-point interpolation methods, and a 3-point interpolation method. The 2-point methods require knowledge of the derivative of the function f in which we are interested in optimizing. The 3-point method does not require any derivatives, but of course requires an extra point. Intuitively, knowing f gives a better sense of the function’s behavior, and will hence provide a faster rate of convergence.

2.1. Method 1. Let q(x) denote the quadratic interpolant of f (x). Then, by deﬁnition:
q(x) = ax2 + bx + c
For a, b, c ∈ R. Then, we can ﬁnd our constants by bracketing the critical point of f , whose endpoints are x1 and x2. We have:

(2.1)

f (x1) = ax21 + bx1 + c = q(x1)

f (x2) = ax22 + bx2 + c = q(x2)

f (x1) = 2ax1 + b = q (x1)

This is a system of 3 equations with 3 unknowns, which are our constants. Let fi = f (xi), and fi = f (xi). Then we can solve (2.1) for a and b.

(2.2)

a = f1 − f2 − f1(x1 − x2) −(x1 − x2)2

b = f + 2 f1 − f2 − f1(x1 − x2) x

1

(x1 − x2)2

1

3

The minimizer of q is easily found to be −b/2a by setting q (x) = 0. From (2.2), our minimizer xmin can be found:

(2.3)

xmin = −2ab = x1 − 21 (fx1−− fx12−)ff21 1 x1−x2

This of course readily yields an explicit iteration formula by letting xmin = x3. We have from (2.3):

(2.4)

1 (xk−1 − xk)fk−1 xk+1 = xk−1 − 2 f − fk−1−fk
k−1 xk−1−xk

With (2.4), we generate xk+1 and compare it with the previous two points to ﬁnd our new bracketing interval, and then continue the iteration.

2.2. Method 2. This method is very similar to our ﬁrst method but instead generates q(x) by solving a diﬀerent set of equations. We have:

(2.5)

f (x1) = ax21 + bx1 + c = q(x1)

f (x1) = 2ax1 + b = q (x1)

f (x2) = 2ax2 + b = q (x2) We can of course solve these for a and b in the same way as in Section 2.1. We ﬁnd the explicit form of the minimizer of q(x) to be:

(2.6)

x1 − x2 xmin = x1 − f1 − f2 f1

In an identical manner to (2.4), we generate a recursion by deﬁning

xmin = x3, and we have:

4

KELLER VANDEBOGERT

(2.7)

xk−1 − xk

xk+1 = xk−1 − f − f fk−1

k−1

k

Which is commonly called the secant formula.

Remark 2.1. Letting xk−1 → xk in (2.7), and assuming that f (xk) exists, (2.7) becomes:

xk+1 = xk − ffk k
But this is precisely the iteration deﬁned by Newton’s method. This motivates calling (2.7) the secant method, because it is just Newton’s method with the secant approximation of f (xk) instead.
2.3. Method 3. Our third method is the 3 point method. Choose 3 points, 2 endpoints to bracket our critical point, and then a point within the interval as well.
Using the Lagrange Interpolation formula, we can easily ﬁnd our interpolant q(x). We have:

(2.8) q(x) = (x − x2)(x − x3) f1+ (x − x1)(x − x3) f2+ (x − x1)(x − x2) f3
(x1 − x2)(x1 − x3) (x2 − x1)(x2 − x3) (x3 − x1)(x3 − x2)
To ﬁnd the minimum, we of course take the derivative of (2.8) and
set it to 0. By doing this, we obtain:

1

1

(2.9) xmin = (x1 + x2) +

(f1 − f2)(f2 − f3)(f3 − f1)

2

2 (x2 − x3)f1 + (x3 − x1)f2 + (x1 − x2)f3

And the iteration scheme is easily deduced:

5

(2.10)

1

1

xk+2 = (xk−1+xk)+

(fk−1 − fk)(fk − fk+1)(fk+1 − fk−1)

2

2 (xk − xk+1)fk−1 + (xk+1 − xk−1)fk + (xk−1 − xk)fk+1

This method diﬀers slightly from the previous two methods, because

it is not as simple to determine the new bracketing interval. If xmin

lies between x1 and x3, then we want to compare the distance between

xmin and x2. If

|xmin − x2| <
Where is the prescribed tolerance, then we are done. Otherwise we create the new interval by checking which point is the largest or smallest, depending on the nature of our critical point.
If xmin lies outside of our bracketing interval [x1, x3], then we immediately create a new bracketing interval in the same way as we just described.

3. Convergence Rates In the beginning of Section 2, we made a comment on the convergence rates of the 2-point versus 3-point method. In this section, we intend to make this comment precise.
Deﬁnition 3.1. Let xn → x∗. Then, the sequence {xn} is said to converge to x* with order α if
||xn − x∗|| ≤ M ||xn−1 − x∗||α For some constant M ∈ R.

6

KELLER VANDEBOGERT

We will also need the following result, which we shall state without proof:

Lemma 3.2. Let f ∈ Cn+1[a, b], and let p be the polynomial of degree ≤ n that interpolates f at n + 1 distinct points x0, x1, ..., xn ∈ [a, b]. Then, to each x ∈ [a, b] there exists a ξx ∈ (a, b) such that

f (n+1)(ξx) k

f (x) − p(x) = (n + 1)!

(x − xi)

i=0

Where f (n)(x) denotes the nth derivative of f . This remainder term

will often be denoted Rn+1.

Theorem 3.3. Let f : R → R be 3 times continuously diﬀerentiable.
Let x* be such that f (x∗) = 0 and f (x∗) = 0. Then the sequence {xn}

generated by (2.7) converges to x∗ with order 1+2 5 .

Proof. We ﬁrst want to prove that (2.7) does indeed converge to our minimizer, x∗.
Let L(x) be the 2 point interpolant for f (x). Then, with Lemma 3.2, we have:

f (ξ) f (x) − L(x) = 2 (x − xk−1)(x − xk) Since xk+1 is generated by maximizing q(x), we have that L(xk+1) = 0. Thus,

f (ξ) f (xk+1) = 2 (xk+1 − xk−1)(xk+1 − xk) If we substitute the recursion for xk+1 given by (2.7) into the above equation, we can simplify this into:

7

(3.1)

1

(xk − xk−1)2

f (xk+1) = 2 f (ξ)fkfk−1 (f − f )2

k

k−1

We now want to take advantage of the Mean Value Theorem. We

have:

fk − fk−1 = f ξ0 xk − xk−1 Where ξ0 ∈ (xk, xk−1). Also note that since f (x∗) = 0, we have:

(3.2)

fi = f (xi) − f (x∗) = (xi − x∗)f (ξi)

For some ξi ∈ (xi, x∗), i = k − 1, k, k + 1. Using (3.2) and the previous line, we can rewrite (3.1) as so:

(3.3)

x − x∗ = 1 f (ξ)f (ξk)f (ξk−1) (x − x∗)(x − x∗)

k+1

2 f (ξk+1)[f (ξ0)]2 k

k−1

Now, let ei = |xi − x∗|, i = k − 1, k, k + 1. Find constants m1, m2,

M1, M2 such that

0 < m1 ≤ |f (x)| ≤ M1 0 < m2 ≤ |f (x)| ≤ M2 where x is any value in the bracketing interval considered. Then, we can bound (3.3):

(3.4)

m1m22

M1M22

2M23 ekek−1 ≤ ek+1 ≤ 2m32 ekek−1

However, using (3.3):

8

KELLER VANDEBOGERT

ek+1 = 1 f (ξ)f (ξk)f (ξk−1) ekek−1 2 f (ξk+1)[f (ξ0)]2 Now if we let each xi → x∗, each ξi will tend toward x∗ as well since they are contained within their respective bracketing intervals and f ,
f are continuous. Thus we see:

(3.5)

ek+1 → f (x∗) ekek−1 2f (x∗)

This is well-deﬁned since f (x∗) = 0 by assumption. Deﬁne M =

2ff ((xx∗∗)) , and using (3.4) and (3.5), we have that

(3.6)

ek+1 ≤ M ekek−1

In which case if δ = max{ek, ek−1},

ek+1 ≤ M δ2 → 0 So our sequence converges to x∗ when x0 = x1. Now we seek to determine the rate of convergence. To do this, we deﬁne yk = log(M ek). We then have, using (3.6):

yk+1 = yk + yk−1

This is of course a simple a recurrence relation. Let φ = 1+2 5 and √
φ¯ = 1−2 5 . We easily ﬁnd the closed form of yk to be:

(3.7)

yk = αφk + βφ¯k

Where α, β are to be determined from our initial conditions. Since φ¯ < 1, for k suﬃciently large, yk ≈ αφk. We then have, as k → ∞:

9

With this:

M ek+1 exp(αφk+1) M φeφ ≈ exp(αφkφ) → 1
k

|xk+1 − x∗| → M φ−1 |xk − x∗|φ
And with Deﬁnition 3.1, this implies that xk → x∗ with order φ =

1+2 5 .

Remark 3.4. We see that the secant method converges at a superlinear rate, whereas Newton’s method converges quadratically under suitable conditions.
We now seek to ﬁnd the rate of convergence of the 3-point method. The following result answers this question:
Theorem 3.5. Let f ∈ C4. Suppose that x∗ satisﬁes f (x∗) = 0 and f (x∗) = 0. Then the sequence {xk} generated by (2.10) converges to x∗ of order 1.32.
Proof. Similar to Theorem 3.3, we begin by writing our function f in the following form:

(3.8)

f (x) = L(x) + R3(x)

Where R3 in the third degree remainder term for the Lagrange interpolant. Now, since x∗ is a critical point, we clearly have that f (x∗) = 0. With this and (3.8):

10

KELLER VANDEBOGERT

L (x∗) + R3(x∗) = 0 L (x∗) can be easily computed, we have:

(3.9)

2x∗ − (x2 + x3)

2x∗ − (x1 + x3)

2x∗ − (x1 + x2)

f1 (x1 − x2)(x1 − x3) +f2 (x2 − x1)(x2 − x3) +f3 (x3 − x1)(x3 − x2) +R3(x ) = 0

We now use (3.9) to solve for x∗:

2x∗

f1 + f2 + f3

+R (x∗)

(x1 − x2)(x1 − x3) (x2 − x1)(x2 − x3) (x3 − x1)(x3 − x1)

3

= f1(x2 + x3) + f2(x1 + x3) + f3(x1 + x2) (x1 − x2)(x1 − x3) (x2 − x1)(x2 − x3) (x3 − x1)(x3 − x2)

Which then gives:

(3.10)

1

R3(x∗)

x =− 2

f1

+

f2

+

f3

(x1−x2)(x1−x3) (x2−x1)(x2−x3) (x3−x1)(x3−x1)

1 +
2

+ + f1(x2+x3)
(x1−x2)(x1−x3)

f2(x1+x3) (x2−x1)(x2−x3)

f3(x1+x2) (x3−x1)(x3−x2)

+ + f1
(x1−x2)(x1−x3)

f2 (x2−x1)(x2−x3)

f3 (x3−x1)(x3−x1)

However, it is easy to see that we can use (2.10) and rewrite it in an

awfully convenient form:

f1(x2+x3)

f2(x1+x3)

f3(x1+x2)

1 + + (x1−x2)(x1−x3) (x2−x1)(x2−x3) (x3−x1)(x3−x2)

x = 4

f1

f2

f3

2 + + (x1−x2)(x1−x3) (x2−x1)(x2−x3) (x3−x1)(x3−x1)

But this is precisely the rightmost term of (3.10), so we easily deduce:

(3.11)

1

R3(x∗)

x − x4 = − 2

f1

+

f2

+

f3

(x1−x2)(x1−x3) (x2−x1)(x2−x3) (x3−x1)(x3−x1) 