What makes good mathematical writing? And what do we “understand” from proofs?

Being a complexity theorist, if you ask me what a mathematical proof is, my first answer is “something that convinces a verifier of the validity of a statement,” and this is the goal that I usually have in mind when writing up a paper: to make it clear, especially to myself, that the result is indeed correct.

With the right software environment, and with the right formal language to specify proofs, it might be possible to write proofs in a format that is not only readable by humans, but by machines too, greatly increasing our confidence in the validity of what we do (and simplifying the job of referees); conceivably automated tools might also produce automated proofs of the “it is clear that” steps, and beyond, and create the formal proof in an interactive way. A recent issue of the Notices of the AMS was devoted to formal proofs and related software tools. Gowers describes here and here what such a computer-assisted theorem-proving might look like.

A proof is easier to follow if it is broken up in a “modular” way, as a series of lemmas such that each lemma can be stated, proved and understood independently, and the lemmas combine in a simple way to give the main result. Then each lemma may similarly be broken up into simpler statements and so on.

I assumed that this was all there was to mathematical writing, until I taught an undergraduate algorithms course at Berkeley (CS170), and I could not understand the explanation of Dijkstra’s algorithm in CLR. To be sure, the presentation was extremely modular, and all proofs had been broken up into entirely elementary steps, but after reading the section in question, I had no idea what was going on. (And, if you are wondering, I *already knew* the algorithm.)

The point is that, although we sometimes *write* a proof with the main goal of establishing that a certain result is true, this is hardly the reason why we *read* proofs, unless we are a referee. We read proofs with the hope of *learning* something new, a technique that can be applied elsewhere, an insight about the nature of a certain mathematical object and so on. If a proof is too detailed, and excessively broken up into little lemmas, one can get lost in the details.

If this is the case, then what makes good mathematical writing? Taking again an analogy from complexity theory, a proof is “zero-knowledge” when, after having completed the process of verifying it, the verifier is not able to do anything that he wouldn’t be able to do before. This suggest a criterion for judging good writing: that after reading a well-written piece of mathematics, we are able to do something that we weren’t able to before.

Gowers has advocated a style of mathematical writing in which a proof is presented without “magical” steps, so that every step is justified in a way that (with the hindsight of the person writing the exposition) makes it natural, and inevitable. If the most natural line of attack is wrong, then one discusses the relevant counterexamples. If a special case contains the essence of the problem but without distracting features, one looks at the special case first, and so on. (In a way, this is the opposite of the ethos of “proofs from the Book,” which emphasizes the magic and the unexplained.)

(Note also the “computational” difference: in a proof written for ease of verification, we break up the proof until each piece is easy to *verify,* but here we are breaking up a proof so that each piece is, by itself, easy to *generate*.)

Reading expositions written in this style can be hard, and the whole thing is not very useful if the writer’s idea of what is “natural” does not agree with the reader’s, but when it works, it is phenomenally rewarding.

Indeed, if we are writing with the goal of giving the reader “non-zero knowledge,” that is the ability to do something new, then presenting the proof of an actual theorem may even be besides the point. This gets to the Tricks Wiki, a repository of mathematical techniques which is coming online any day now, and which is specifically designed to extract pieces of mathematical lore that one normally learns by reading proofs (and “understanding” them), and to present them in an independent form.

It’s not clear that something like the Tricks Wiki would be needed in theoretical computer science, because I think that we do a reasonable job at making explicit the techniques that we use; as for Gowers’ notion of writing an exposition in which there is no magical step, I’d love to see some currently impenetrable piece of theoretical computer science be explained in this way.

I also wonder if it is possible to come up with precise definitions that capture (even very approximately) the notion of a proof being understandable, understood, deep, and so on, and whether philosophers have any insight into the matter.

It’s worth mentioning that Grothendieck had a similar line of thought regarding tricks (see Notices of the AMS, Vol 51 No 9, page 1046 at http://www.ams.org/notices/200409/index.html ):

The main philosophy of Grothendieck was that mathematics should be reduced to a succession of small, natural steps,” Borel said. “As long as you have not been able to do this, you have not understood what is going on…. And his proof of Riemann-Roch used a trick, une astuce. So he didn’t like it, so he didn’t want to publish it…. He had many other things to do, and he was not interested in writing up this trick.”

I also wonder if it is possible to come up with precise definitions that capture (even very approximately) the notion of a proof being understandable, understood, deep, and so on, and whether philosophers have any insight into the matter.Manuel Blum and I have thought about this question for some time and have thrown around several possible definitions. Here’s one example. At some point we had the idea that if an algorithm understands a proof P of a theorem T, then: for every small change T’ to the statement of T, if a small change to P suffices to prove T’, then the algorithm can prove T’ efficiently, given T and P as advice.

While this is a natural consequence of understanding a proof, it is not clear it is sufficient for understanding, i.e., that one would say you must understand when you can pass this kind of test. Also we could never quite agree on what kind of metric to use to measure a “small change.” If you think of the T’s and P’s as bit strings and use Hamming distance, an interesting notion arises (where the larger the “small change” is, the more “understanding” the algorithm is said to have) but the original concept doesn’t seem to be captured very satisfactorily. It would be interesting if one could prove that a seemingly broader definition is actually equivalent to the “small change” notion.

Since one’s understanding of a proof can improve over time (as one sees more examples of similar proofs), the notion may be better modeled by a learning framework.

“At some point we had the idea that if an algorithm understands a proof P of a theorem T, then: for every small change T’ to the statement of T, if a small change to P suffices to prove T’, then the algorithm can prove T’ efficiently, given T and P as advice.”

Two theorem might look very much like each other, and still have a very different meaning (and proof), and two different looking theorems might have almost the same meaning (and proof). So perhaps it would be better to say that an algorithm (or a human) understands a proof P of T if: for any given T’, if a small change to P suffices to prove T’, then the algorithm (or human) can prove T’ efficiently, given T and P as advice (T´ doesn’t have to look like T)?

An advantage of this definition is, that it might be easier to measure the distance between two proofs than between two theorems, because proofs usually are longer and says more about the problem than the theorem itself: Someone how doesn’t know anything about math might think that “x^2+y^2=z^2, x,y,z in N have infinitely many integral solutions” and “x^n+y^n=z^n, x,y,z,n in N n>2 don’t have any solutions” look like each other, but no one would say that the proofs looks like each other!

Yes, we considered the ” for all T’ ” version as well, and in fact most things we proved also work for the more general requirement. We thought that requirement may be too stringent. Of course all this seems to depend (unfortunately) on both the chosen encodings of theorems and proofs as well as the distance metric. A “robust” definition that is independent of these two choices (up to some negligible factors) would be really great, similar to how polynomial-time is a robust notion. Perhaps a few reasonable axioms for the encoding scheme and distance metric would be good enough for this sort of definition.

I am a big fan of Gowers’ approach in mathematical writing but I have had strong disagreements with co-authors and others over it. I tend to be minimalist and try to introduce new definitions and concepts only as needed for the problem at hand. This makes it much easier to motivate the definitions and bring the reader into the flow. (If a more general definition is at least as simple so much the better – that is not what I am talking about.) On the other hand, if the generalization has no immediate utility then there is the temptation to say that the additional verbiage to state and prove it is not worth it. A narrower focus has sometimes caused me not to state such generalizations, to my later regret. As my co-authors argued, by stating and proving the results in as general a context as possible the paper becomes more valuable as a reference for later work.

An objection I’ve also heard is that presenting all that intuition limits the reader to the authors’ narrow way of thinking about the problem and therefore limits the reader’s creativity. A variant of this objection is that the intuition can be misleading and convince the reader of incorrect proofs more easily than ones that have to be verified line by line.

P.S. The approach you describe is common in textbooks like CLR and Kleinberg-Tardos: go with the “least provable unit” and accumulate the consequences. The difficulty there is that sometimes the essence of the big picture disappears (see the multi-page breadth-first search correctness proof in CLR for example.) The CLR proof of Dijsktra’s algorithm that you cite doesn’t fit that mold but has a particularly egregious use of double negation. (It has not gotten better in CLRS.) The first time I read it, I was prepping for a class and had no other resources around me. It sufficiently confused me that for a while I could not reconstruct the much simpler direct proof.

If we are able to do something different after reading a proof then our neurons must have changed their configuration. (Most effectively overnight)

It would be interesting to find out what kind of discoursive/visual/intuitive explaination can go with a proof that elicit in the “right” way the “right” neurons.

I think that, first of all, there’s a key distinction to be made: on one hand, we have proofs; on the other hand, we have explanations; those are indeed two different classes. While it is clear that an explanation is actually a proof, a proof does not always explain what it proves.

I declare that “magical” mathematical arguments are proofs that are not explanations, while “natural” arguments actually are explanations.

Two examples.

By empirical observation we can “prove” that light can be deflected in a gravitational field. However, Einstein relativity theory proves the same fact by explaining why that is going to happen when we make the observation. In the first case, the “prover” says: “things work that way, you must acknowledge”. In the second case, the “prover” unveils the deep reason and let you to give a look “under the hood”.

That is likely to happen, in the same way, in mathematics. For example, think about Galois theory – which explains why we got a kind of solution for some polynomial equations and not for the others – vs Abel proof.

So, I would say that explanations are “characterization theorems”, “structure theorems”. Hence, we would prefer a theorem about finite abelian groups to be proven as a consequence of the “classification theorem of finite abelian group”: this way, it is likely to be explained rather than merely proved. Here, we would have a particular phenomenon seen as a part of the Whole, not as a matter of fact.

So, I would say that explanations are “characterization theorems”, “structure theorems”. Hence, we would prefer a theorem about finite abelian groups to be proven as a consequence of the “classification theorem of finite abelian group”: this way, it is likely to be explained rather than merely proved. Here, we would have a particular phenomenon seen as a part of the Whole, not as a matter of fact.

My point of view is exactly the opposite: structure theorems rarely give any satisfying explanation of why things work. Instead, they lend themselves to case analysis.

Here’s a case study. Consider the theorem that all algebraically closed fields are infinite. The easiest proof (given enough knowledge) is to observe that we know the complete list of finite fields and not one of them is algebraically closed. This is certainly convincing, but to me it makes the result feel like an accident.

Here’s another proof. The number of monic irreducible polynomials over any field is infinite, by exactly the same argument as Euclid used to prove that there are infinitely many primes. If the field is algebraically closed, then all those irreducibles must be linear, so the field must be infinite.

This proof is more involved and uses weaker tools, but I feel that it gives a much clearer explanation of why the result has to be true. Structure theorems do have the advantage that you can apply them without thinking, which makes them more powerful tools. However, there are times when it is nice to think.

Pingback: About Mathematics « Combinatorics and more

Fred has an intensional view of “understanding,” that presupposes there is some “right” way of proving a statement, and that reaches a proper understanding by finding this right way. I prefer, as explained in the post, an extensional view “understanding,” which looks at what a particular proof does for us, and

what elsewe can do with it.Indeed, we often see the value of finding new proofs of statements that previously seemed completely “understood.” The new proof might suggest a generalization to a setting that was not reachable with the previous proof, and in this new setting might generate new questions and so on.

Pingback: When is a Mathematical Proof Difficult? « in theory

Pingback: Thurston on Understanding and Collective Work in Mathematics « in theory

I have obviously arrived at this discussion too late… In case anybody is still listening, here is a link to a report that I wrote on the subject a few years ago: http://www.cs.ucy.ac.cy/~cakap/proofs-vs-explanations.pdf . (I warn that the writing is philosophical —but with some reference to computation— and rather verbose.) In short, the suggestion is this:

A _proof_ is an _explanation_ if it establishes a single correspondence between its own objects and relations, on the one side, and the objects and relations in the reader’s mind, on the other side. Between two proofs that are both explanations, the best one is the one whose correspondence is easiest (for the reader’s brain) to compute.

Some remarks:

1. Good proofs probably have many good properties. I was trying to capture only one: the intellectual satisfaction that they give us (the “Oh, I see” moment —as opposed to the puzzlement that bad proofs leave you with). It is such proofs that I call explanations.

Explanations (in the sense that I describe) may not really allow you to prove more theorems; i.e., they may be “little-knowledge”. Also, a proof whose steps are all natural may very well not be an explanation. I believe the report contains good simple examples.

2. Here “correspondence” means “analogy” (in the sense of Hofstadter) or “metaphor” (in the sense of Lakoff and Nunez). The “objects and relations in the reader’s mind” is the product of his physical experience so far. I have no rigorous language for these notions. (But see “Where mathematics comes from” where Lakoff and Nunez start being more concrete.)

Being an explanation seems to be an extra-logical property of proofs. You can’t decide it by just looking at the proof. You must also look at the reader’s mind. If this is correct, then this may be a problem where cognitive sciences play the primary role.

3. My suggestion asks for a _single_ correspondence: _one_ correspondence that can guide the reader throughout the entire proof. I believe this explains why modularity may render a proof non-explanatory: the correspondences established by the proofs of the different lemmata may not be composed into a single correspondence that is valid throughout the entire proof.

4. There is already a rich discussion in the Philosophy of Science about explanation in the context of the physical sciences (“When does a scientific theory really explain the phenomena?”). The discussion about proofs may gain from that discussion.

One possible morale may be that any attempt to formally say anything about “good proofs” should first clarify which of the several different notions of “goodness” it is aiming at, then make sure any issues related to cognition can indeed be overcome mathematically.

Hope this helps.