Posts The mean value theorem
Post
Cancel

The mean value theorem

Introduction

Derivatives are used across many different fields of engineering, physics and mathematics to analyze the ways that continuous quantities change. Although the definition of derivative that we talk about in calculus courses today works well, it wasn’t always so simple. Coming up with a way to talk about derivatives where we could both understand it intuitively as well as give ourselves the right machinery to prove useful results about them took centuries and a bunch of mathematical legwork that I thought would be worth exposing a small part of.

In particular, in this post I want to define derivatives and lead up to the proof of the Mean Value Theorem. It and its generalizations are some of the most important and useful results about derivatives that we have and I thought that proving the most elementary version would be accessible and somewhat fun!

(Some small amount of calculus is assumed, but it isn’t strictly necessary.)

Without further ado, here we go!

Defining the derivative

The first thing we need to do is come up with a rigorous and useful definition of a derivative. To help draw this intuitive picture, imagine the following. You have two points on the $x$-axis that are pretty close together. We’ll call one point $x$ and the other $c = x + \text{a little bit}$. If our curve is given by $f(x)$, we can denote the slope of the line through $(x,f(x))$ and $(c, f(c))$ as the change in $y$ values divided by the change in $x$ values (aka rise over run). In other words, the slope $m$ of the line would be given by

$$m = \frac{f(c) - f(x)}{c - x}.$$

This isn’t anything new. You’ve known how to calculate slopes since middle school. The question of slope becomes more difficult, however, when you try to modify the usual notion of slope of a line through two points to come up with an analogous description for the slope of tangent line to a curve at one point.

(Note: I will assume for the remainder of this post that you have some familiarity with limits and continuity. If you don’t, you might want to browse the web and review those briefly. Even if you don’t have the requisite background, I’ll do my best to explain the concepts in a non-technical, intuitive way.)

To do this, we have to use some stuff from introductory calculus. Intuitively, what we’re going to do is let “a little bit” get smaller and smaller, so that $x$ and $c$ get closer and closer together. As the distance between them gets smaller and smaller, the slope of the secant line through $(x,f(x))$ and $(c,f(c))$ approximates the slope of the tangent line at $x$. Put succinctly, the slope of the tangent line of a curve at a point $c$ is the limit of the slopes of secant lines between $(c,f(c))$ and points $(x, f(x))$ where $x$ gets arbitrarily close to $c$. Put mathematically

$$f’(c) := \lim_{x \to c} \frac{f(c) - f(x)}{c - x}.$$

We say that the function $f$ is differentiable at $c$ if the limit above exists. This idea of differentiability at a point can be extended naturally to a set of points.

In a typical real analysis course, after laying definitions down, you get familiar with the definitions by proving some of the usual facts about derivatives. These are things like $(f + g)’(c) = f’(c) + g’(c), (kf)’(c) = kf’(c)$ ($k$ is a constant and $f, g$ are differentiable at $c$). I’ll leave proving these and other facts like the power, product, quotient and chain rules as an exercise for the inclined reader in favor of venturing into more interesting territory.

In particular, I want to lead up to a proof of the mean value theorem, one of the most fundamental and useful facts about derivatives. It’s one of those theorems that looks sort of obvious when you just draw some pictures (a la intermediate value theorem), but it’s actually a pretty deep result. It and its generalizations are some of the the most useful tools mathematicians have had at their disposal to tackle derivatives since their inception in the 17th century. We’ll get to our main result via a few intermediate ones, starting with the interior extremum theorem (IET).

The interior extremum theorem

The interior extremum theorem tells us something about the connection between a function’s extreme values and points at which the derivative vanishes:

Interior Extremum Theorem: Let $f$ be differentiable on the open interval $(a,b)$. Then if $f$ attains a maximum (or a minimum) on $(a,b)$ at some point $c$, then $f’(c) = 0$.

Proof: Because $c$ is in the open interval $(a,b)$, we can find two sequences, $x_n$ and $y_n$, such that both converge to $c$ and $x_n < c < y_n$ for all $n$. Given these sequences, we have

$$f’(c) = \lim_{n \to \infty} \frac{f(c) - f(x_n)}{c - x_n} \geq 0,$$

because $f(c) - f(x_n)$ is nonnegative (because $f(c)$ is a maximum value of $f$ on $(a,b)$) and $c - x_n$ is positive for all $n$ (because $x_n < c$). On the other hand, we also see that

$$f’(c) = \lim_{n \to \infty} \frac{f(c) - f(y_n)}{c - y_n} \leq 0,$$

because the numerator is again nonnegative, but this time the denominator is negative for all $n$. Thus, we have $0 \leq f’(c) \leq 0, $ so $ f’(c) = 0$. QED.

Note that the converse is not necessarily true. It isn’t necessarily the case that if the derivative is 0 at a point $c$, then $f(c)$ is maximum or a minimum of $f$ on $(a,b)$. Consider the function $f(x) = x^3$ on the interval $(-1,1)$. By the power rule, $f’(x)=3x^2$ (and it is defined on $(-1,1)$). At $x = 0$,$ f’(x) = 0$, but $-1 = f(-1) < f(0) < f(1) = 1$, whence $f$ takes neither a minimum nor a maximum value at 0 even though the derivative vanishes there. Although its converse doesn’t hold, the IET does furnish us with a pretty powerful computational tool with which to solve optimization problems, the solutions to which often begin with “First take the derivative of the function you want to optimize and find the $x$ values at which it equals 0…”.

Rolle’s theorem

Next, I want to use the IET to prove another result that isn’t so hard to convince yourself of with pictures. It’s called Rolle’s theorem. It tells us that if f takes the same value at two ends of an interval and is differentiable on said interval, that there must be some point within the interval at which the derivative is zero. I actually think it’s worthwhile to draw some pictures and convince yourself intuitively that this result makes sense before you read the proof below:

Rolle’s Theorem: Let $f$ be continuous on $[a,b]$ and differentiable on $(a,b)$ with $f(a) = f(b)$. Then there is a point $c \in (a,b)$ such that $f’(c) = 0$.

Proof: If $f$ is constant on $(a,b)$, then $f’$ is identically zero on that interval, in which case there’s nothing to prove. If $f$ is non-constant on the interval, f takes on a maximum or minimum value within $(a,b)$. By applying IET, we’re done. QED.

The constant case above is simple. In the non-constant case, what we’re basically saying is that starting at $f(a)$, the value of $f$ rises (or falls) as the $x$ values move to the right. At a certain point, though, they need to start falling back down (rising back up) to the value $f$ took at $a$. The point at which this fall (rise) happens is the point we sought.

Mean value theorem

With these results, we are now equipped to prove the mean value theorem. We will do this by reducing it to a simple application of Rolle’s theorem.

Mean Value Theorem: Let $f$ be continuous on $[a,b]$ and differentiable on $(a,b)$. There is a point $c \in (a,b)$ such that $f’(c) = \frac{f(a) - f(b)}{a - b}$.

(Note that this is a more general version of Rolle’s theorem. Draw some pictures and convince yourself of the theorem before reading the proof below.)

Proof: Let’s first write down what the equation of the line through $(a, f(a))$ and $(b, f(b))$ would look like. On the one hand, we know that the slope of the line is $\frac{f(a) - f(b)}{a - b}$. On the other, because we are talking about a line, that the slope of the same line through any $(x, y(x))$ and $(a, y(a) = f(a))$ should be the same, so we have $\frac{f(a) - f(b)}{a - b} = \frac{y(x) - f(a)}{x - a}.$ We can get a general equation of the line by solving for f(x) (multiply both sides by $x - a$ and then add $f(a)$ to both sides). Doing so, we see that

$$y(x) = \frac{f(a) - f(b)}{a - b}(x - a) + f(a).$$

In order to transform this into an instance of Rolle’s theorem, what we’re going to do is build a new function $d$ that represents the differences between the curve and the line. Namely, we have

$$d(x) = f(x) - y(x) = f(x) - \biggr[ \frac{f(a) - f(b)}{a - b}(x - a) + f(a) \biggr].$$

Clearly, $d(a) = d(b) = 0$. By the rules about combining continuous functions, $d(x)$ is continuous on $[a,b]$ and by rules for combining differentiable functions, $d(x)$ is differentiable on $(a,b)$. This means that we can apply Rolle’s Theorem to $d(x)$ so that we have a $c \in (a,b)$ such that $d’(c) = 0$. That is, there is a $c$ such that $d’(c) = f’(c) - \frac{f(a) - f(b)}{a - b} = 0$. Rearranging a bit, we see that at that $c$, we have $f’(c) = \frac{f(a) - f(b)}{a - b}$, proving the theorem. QED.

Conclusion

Several of the important results (e.g. L’Hosptial’s Rule for solving limits) that you learn about later on in calculus and real analysis courses use the Mean Value Theorem as their driving force. I haven’t seen many theorems that are both simple to conceptually grasp and also fundamental building blocks of important areas of mathematics. I think the derivative is a phenomenal example of the power and necessity of good definitions and the type of ingenuity that appears all across mathematics, albeit more subtly sometimes.

This post is licensed under CC BY 4.0 by the author.