Stability and Genericity

Before I begin, I want to give credit where credit is due: much of the exposition (especially the proofs) of my last post was paraphrased from Guillemin and Pollack’s Differential Topology [1].  One of my favorites.

Okay, moving on.

We saw last time that Morse functions are pretty neat, and are abundant; “almost all” smooth functions are actually Morse functions. I’d like to take a minute to talk about this type of property (along with a related notion, “genericity“), as well as a notion called stability. “Almost all” is usually a phrase one comes across in analysis (or, as we saw, fields that use analysis, of which there are tons), and it means “the set of ‘bad choices’ is a set of (some suitable) measure zero.”  “Genericity,” or, “being generic” is more or less the algebro-geometric counterpart to “almost all” (although it isn’t uncommon to use “generic” to mean “almost all”).  Something is generic in something else if it is true on an open dense set.  In algebraic geometry, we’d usually say “the set of bad choices lies in a subset of strictly smaller dimension.”  Anyway, the basic idea is that such properties/objects are what you’d expect to find if you picked one “at random.”  For example: draw a curve on a piece of paper (and pretend it’s \mathbb{R}^2).  If you were to close your eyes, and put your finger on the paper, you’d basically always miss the curve, and land on blank space, illustrating that a generic point of \mathbb{R}^2 isn’t on the curve.

What do I mean by stability?  What is “stable?”  If you recall the sketch of a proof I gave last time for “almost all functions are Morse functions,” given some smooth functions f: U \to \mathbb{R} (where U \subseteq \mathbb{R}^n is an open subset), we “deformed,” or “perturbed” f into a Morse function, f_a := f + a_1 x_1 + \cdots +a_n x_n, by adding a generic linear form.  If I deform some smooth map f_0: M \to N to another map f_1 : M \to N, I’m invoking one the fundamental operations in (differential) topology: homotopy.  We’d say f_0 and f_1 are homotopic, usually written f_0 \thicksim f_1, if there exists a smooth map F: M \times [0,1] \to N such that, for all x \in M, F(x,0) = f_0(x) and F(x,1) = f_1(x).  Smoothness of F then ensures that all the “in between” maps f_t(x) := F(x,t) are smooth as well.  Here’s a simple example to illustrate that this is really what we mean when we say f_0 is deformed to f_1.

.  photo

Here’s a really, really nice description of this notion in [1]:

In the real world of sense perceptions and physical measurements, no continuous quantity or functional relationship is ever perfectly determined.  The only physically meaningful properties of a mapping, consequently, are those that remain valid when the map is slightly deformed.  Such properties are stable properties, and the collection of maps that posses a particular stable property may be referred to as a stable class of maps.  Specifically, a property is stable provided that whenever f_0: X \to Y possesses the property and f_t : X \to Y is a homotopy of f_0, then, for some \epsilon > 0, each f_\epsilon with t < \epsilon also possesses the property.

In this vein, the idea is that stable properties are “observable.”  These are the types of things we want to look for when playing around with functions.

I said before that the ideas of stability and genericity were related.  Suppose I want to find a Morse function f: M \to \mathbb{R}.  We know already that almost all smooth, real valued functions on M are Morse.  But what happens if I happen to pick a bad one?  Never fear; we deform f by adding some generic linear form \ell_a = \sum_{i=1}^n a_i x_i.  Moreover, by the genericity of the choice of \ell_a, we can pick good choices of “deformation vector” a = (a_1,\cdots,a_n) such that \| a \| is arbitrarily small.  Hence, even if we end up picking a bad function f, for any \epsilon > 0, we can find a Morse function f_a such that \|f - f_a\| < \epsilon.  This is a common occurrence for stable properties: even if you happen to find a bad function, there are arbitrarily close good functions (in the space of smooth maps with, say, the supremum norm).  Some of the most common stable properties for a smooth map f: M \to N are:

  •  local diffeomorphisms
  • immersions
  • submersions
  • maps transversal to a given submanifold Q \subseteq N
  • embeddings
  • diffeomorphisms.

Morse functions are also stable, with the caveat that we require our domain to be compact. Let f is a Morse function on a compact manifold X, and let f_t be a homotopic family of functions with f_0 = f.  Then, f_t is Morse for all t sufficiently small.

In upcoming events, we’ll want to analyze the topology of a manifold by studying the level sets  of Morse functions on the them, and these notions of genericity and stability will ensure that the selection of such functions is never in short supply.

A Tour de Morse (theory)

Morse theory is amazing.  Very geometric, more-or-less very intuitive.  You don’t really explore it in detail until you’ve seen a fair bit of differential topology, but if you look closely, you start getting exposed to its core ideas as early as multivariate Calculus.

As is the fashion in modern geometry (specifically, algebraic geometry), we study geometric objects by studying the behavior of (appropriate classes of ) functions on them.  But which functions?  In algebraic geometry, if you’ve got some nice affine variety, you’ve got a set god-given functions to use: the coordinate ring of the variety.  Here, in the affine case, this is a finitely generated, reduced (=no nilpotents) algebra over a field.  Basically, a quotient of a polynomial ring by a radical ideal.  Not too bad, quite manageable.

However, for a smooth manifold M, the class of smooth functions on M is really big.  To make it “worse”, the existence of bump functions makes it hard to obtain too much cohomology info from the sheaf of smooth functions.  Of course, we can use differential forms to obtain geometric (cohomological) info; this is known as the de Rham cohomology of M, and it’s actually isomorphic to the singular cohomology of M, which is also isomorphic to the Cech cohomology of the constant sheaf, \mathbb{R}_M.

Cue Morse functions.

But first (I lied), we need to recall some basic terminology.  Let f: M \to N be a smooth function between the smooth manifolds M and N.  For every point x \in M, f induces a linear transformation d_xf : T_x M \to T_{f(x)}N between tangent spaces.  We say that x is a regular point of f if the map d_xf is surjective (this means “f is a submersion at x“).  If d_xf is not surjective, we say x is a critical point of f.  Suppose f(x) = y.  We say y is a regular value of f if, for all x \in f^{-1}(y), x is a regular point of f.  If this is not the case (i.e., some point in the preimage of y is a critical point), we say that y is a critical value of f.  If you’ve been good and remember your basic Calculus, regularity of a point/value tells us a lot (topologically) about M near x.  Via the Implicit Function Theorem, if y is a regular value, the set f^{-1}(y) is a smooth submanifold of M, of pure codimension one.  If x is a regular point of f, there is an open neighborhood, U of x in M such that f^{-1}(f(x)) \cap U is a smooth submanifold of M of pure codimension one.

But what happens at critical points?  Critical values?  How much do we have to worry?  How abundant are they?  Fortunately, we have

Sard’s Theorem: the set of critical values of f: M \to N has measure zero in N.

So, “almost all” points of N are regular values of f.  But, let’s go deeper: what happens at critical points?

Okay, so this is where you start seeing this stuff in early Calculus.  Say we’ve got a smooth function f: \mathbb{R} \to \mathbb{R}, and we look at its graph, M, in \mathbb{R}^2.  One of the first things we investigate are the “tangent lines” to points on the graph; here, these are the tangent spaces to M.  Using this we can answer the question “where does f achieve extreme values?” Every Calc student knows (or, should know) that these can only happen (at smooth points of the domain of f) when the tangent line to f at some point x \in \mathbb{R} is “horizontal”, that is, when f'(x) = 0.  Equivalently,  when d_xf : T_x\mathbb{R} \to T_{f(x)}\mathbb{R} is not surjective (since in the one dim. case, d_xf(v) = f'(x)v, and d_xf is surjective iff f'(x) \neq 0).

But what about the second derivative?  After all, we said f was infinitely differentiable.  Hopefully these higher derivatives contain more information?

Of course, you already know the answer.  Suppose f'(x) = 0, but f''(x) \neq 0.  Well, it’s either going to be positive or negative.  If f''(x) > 0, then we know f has a local minimum at x.  If f''(x) < 0, then f has a local maximum at x.  Similarly, we’d say the graph is locally “concave up” in the former case, “concave down” in the latter.  Intuitively, the graph “looks like” the parabola y = \pm x^2 around x, depending on the sign.  We can’t really apply this analysis in the case where f''(x) = 0; for that, you’d need to use Taylor’s theorem to get more information about f at x.

It isn’t really until we start doing Calculus in several variables that we see the utility of this approach.  Let’s move to three variables.  Let f: \mathbb{R}^2 \to \mathbb{R} be a smooth function, and let M = \{(x,y,f(x,y))| x,y \in \mathbb{R}\} be the graph of f.  Suppose p= (x_0,y_0) is a critical point of f.  Recall the differential in this case is given by

d_pf(a,b) = a \frac{\partial f}{\partial x}(p) + b \frac{\partial f}{\partial y}(p)

and saying that p is a critical point of f means that \frac{\partial f}{\partial x}(p) = \frac{\partial f}{\partial y}(p) =0. Since we’ve got more than one variable, any kind of “Second derivative test” is going to need to information from all the second partial derivatives, in some way.  For example, how do we reinterpret the criterion f''(x) \neq 0 in this case?  

I’ll save you the trouble and just say it: what we need to examine is something called the Hessian of f at p:

H(p) = \begin{pmatrix} \frac{\partial^2 f}{\partial x^2}(p) && \frac{\partial^2 f}{\partial x \partial y}(p) \\ \frac{\partial^2 f}{\partial y \partial x}(p) && \frac{\partial^2 f}{\partial y^2}(p) \end{pmatrix}

The Hessian of f at a point is just the matrix of second partials of f, arranged in a particular way.  (In the general case of \mathbb{R}^n, with coordinates (x_1,\cdots, x_n), the Hessian takes the form \left ( \frac{\partial^2 f}{\partial x_i \partial x_j} \right )_{1\leq i,j \leq n}.  Requiring that “f''(x) \neq 0 now becomes D(p) := \text{det}H(p) \neq 0, and in such a case, we say p is a nondegenerate critical point of f.  We say

  • p is a local minimum of f if D(p) > 0, and \frac{\partial^2 f}{\partial x^2}(p) > 0;
  • p is a local maximum of f if D(p) > 0, and \frac{\partial^2 f}{\partial x^2}(p) < 0; and
  • p is a saddle point of f if D(p) < 0.

Intuitively, this says that the graph of f locally looks like the paraboloid z= \pm (x^2 + y^2) in the first two cases (depending on the sign), and like the hyperbolic paraboloid (= “saddle”) z = x^2 - y^2 in the third case.

But what do I mean “looks like”?  Is there a formal way to express this?  Of course, or I wouldn’t be talking about it.

Might as well do the general case: Let M be a smooth manifold of dimension n, f: M \to \mathbb{R} a smooth function. Let p \in M, and suppose that p is a nondegenerate critical point of f*.  Then, there is a smooth system of coordinates about p such that, in these coordinates, f may be written as

f(y_1,\cdots, y_n) = f(p) - \sum_{i= 1}^\lambda y_i^2 + \sum_{i = \lambda + 1}^n y_i^2

where 1 \leq \lambda \leq n is the index of f at p (= the number of negative eigenvalues of H(p)).  This result is known as the Morse Lemma, and it legitimizes our intuition from the previous examples.

*We had previously defined the Hessian of f at p within a given coordinate system.  As it turns out, “nondegeneracy” of a critical point is independent of coordinates, as is the index.*

Nondegeneracy of a critical point is basically the next best thing to requiring regularity of a point. In addition to the Morse lemma, nondegenerate critical points are isolated as well.  That is, at such a point p, we can find an open neighborhood U of p such that p is the only critical point of f|_U.  This isn’t even that hard to show: if (x_1,\cdots,x_n) are local coordinates about p, define a new function, F : M \to \mathbb{R}^n via

F(q) = (\frac{\partial f}{\partial x_1}(q),\cdots,\frac{\partial f}{\partial x_n}(q))

Since p is a critical point of f, F(p) = (0,\cdots,0) \in \mathbb{R}^n.  Then, the differential of F at p is equal to the Hessian of f at p, so nondegeneracy of p implies nonsingularity of d_pF.  Hence, by the Inverse Function theorem, F carries some open neighborhood U of p in M diffeomorphically onto an open neighborhood of the origin in \mathbb{R}^n.  That is, p is the only critical point of f inside U.

In keeping with all these definitions, we say a smooth function f: M \to \mathbb{R}^n is a Morse function if all its critical points are nondegenerate.  Some authors impose the additional requirements that every critical value has only one corresponding critical point, or that f be proper (= preimage of a compact set is compact).  For now, I’ll stick to my original definition.

Morse functions are basically as good as it gets for our current approach:  Almost all level sets f^{-1}(c) are smooth submanifolds of M of codimension one, and the bad points (=critical points) where our analysis fails are isolated incidents, and even then, we know exactly what f looks like in an open neighborhood of a bad point.  But are Morse functions too good to be true?  Do we encounter them often?  As it turns out, like our worries about regular points/values, “almost all” smooth functions are Morse functions.  The core of the proof is actually (again) just Sard’s theorem.

Let’s just examine the case where f is a smooth function on an open subset U \subseteq \mathbb{R}^n to \mathbb{R}.  Let (x_1,\cdots,x_n) be a choice of coordinates on U.  For a = (a_1,\cdots,a_n) \in \mathbb{R}^n, we define a smooth function

f_a := f + a_1x_1 + \cdots + a_n x_n

Theorem: No matter what the function f is, for almost all choices of a, f_a is a Morse function on U.

Again, we use the function F(q) = (\frac{\partial f}{\partial x_1}(q),\cdots,\frac{\partial f}{\partial x_n}(q)).  Then, the derivative of f_a at a point p is represented in these coordinates as

d_p f_a = (\frac{\partial f_a}{\partial x_1}(p),\cdots, \frac{\partial f}{\partial x_n}(p)) = F(p) + a

So, p is a critical point of f_a if and only if F(p) = -a.  Since f_a and f have the same second partials, the Hessian of f at p is the matrix d_p F.  If -a is a regular value of F, whenever F(p) = -a, d_pF is nonsingular.  Consequently, every critical point of f_a is nondegenerate.  Sard’s theorem then implies that -a is a regular value of F for almost all a \in \mathbb{R}^n.

There’s so much more to talk about, but I’ve already rambled on for quite a bit.  Until next time.