You are currently browsing the monthly archive for December 2008.

Today, my favorite gift was to receive an announcement of the First International Conference on Mathematics is Inconsistent, devoted to professor Kamouna’s Bi-polarism theory. The theory proves that SAT is not NP-complete, and that the Continuum Hypothesis and the Axiom of Choice are false. It also reconciles General Relativity and Quantum Mechanics.

Merry Christmas!

[Update 12/27: a commenter points out this article in the popular press.]

What makes good mathematical writing? And what do we “understand” from proofs?

Being a complexity theorist, if you ask me what a mathematical proof is, my first answer is “something that convinces a verifier of the validity of a statement,” and this is the goal that I usually have in mind when writing up a paper: to make it clear, especially to myself, that the result is indeed correct.

With the right software environment, and with the right formal language to specify proofs, it might be possible to write proofs in a format that is not only readable by humans, but by machines too, greatly increasing our confidence in the validity of what we do (and simplifying the job of referees); conceivably automated tools might also produce automated proofs of the “it is clear that” steps, and beyond, and create the formal proof in an interactive way. A recent issue of the Notices of the AMS was devoted to formal proofs and related software tools. Gowers describes here and here what such a computer-assisted theorem-proving might look like.

A proof is easier to follow if it is broken up in a “modular” way, as a series of lemmas such that each lemma can be stated, proved and understood independently, and the lemmas combine in a simple way to give the main result. Then each lemma may similarly be broken up into simpler statements and so on.

I assumed that this was all there was to mathematical writing, until I taught an undergraduate algorithms course at Berkeley (CS170), and I could not understand the explanation of Dijkstra’s algorithm in CLR. To be sure, the presentation was extremely modular, and all proofs had been broken up into entirely elementary steps, but after reading the section in question, I had no idea what was going on. (And, if you are wondering, I already knew the algorithm.)

The point is that, although we sometimes write a proof with the main goal of establishing that a certain result is true, this is hardly the reason why we read proofs, unless we are a referee. We read proofs with the hope of learning something new, a technique that can be applied elsewhere, an insight about the nature of a certain mathematical object and so on. If a proof is too detailed, and excessively broken up into little lemmas, one can get lost in the details.

If this is the case, then what makes good mathematical writing? Taking again an analogy from complexity theory, a proof is “zero-knowledge” when, after having completed the process of verifying it, the verifier is not able to do anything that he wouldn’t be able to do before. This suggest a criterion for judging good writing: that after reading a well-written piece of mathematics, we are able to do something that we weren’t able to before.

Gowers has advocated a style of mathematical writing in which a proof is presented without “magical” steps, so that every step is justified in a way that (with the hindsight of the person writing the exposition) makes it natural, and inevitable. If the most natural line of attack is wrong, then one discusses the relevant counterexamples. If a special case contains the essence of the problem but without distracting features, one looks at the special case first, and so on. (In a way, this is the opposite of the ethos of “proofs from the Book,” which emphasizes the magic and the unexplained.)

(Note also the “computational” difference: in a proof written for ease of verification, we break up the proof until each piece is easy to verify, but here we are breaking up a proof so that each piece is, by itself, easy to generate.)

Reading expositions written in this style can be hard, and the whole thing is not very useful if the writer’s idea of what is “natural” does not agree with the reader’s, but when it works, it is phenomenally rewarding.

Indeed, if we are writing with the goal of giving the reader “non-zero knowledge,” that is the ability to do something new, then presenting the proof of an actual theorem may even be besides the point. This gets to the Tricks Wiki, a repository of mathematical techniques which is coming online any day now, and which is specifically designed to extract pieces of mathematical lore that one normally learns by reading proofs (and “understanding” them), and to present them in an independent form.

It’s not clear that something like the Tricks Wiki would be needed in theoretical computer science, because I think that we do a reasonable job at making explicit the techniques that we use; as for Gowers’ notion of writing an exposition in which there is no magical step, I’d love to see some currently impenetrable piece of theoretical computer science be explained in this way.

I also wonder if it is possible to come up with precise definitions that capture (even very approximately) the notion of a proof being understandable, understood, deep, and so on, and whether philosophers have any insight into the matter.

This Onion article was published in January, 2001. (via Penmachine)

We can all chuckle at the advertisement for the spa at the Wenjing hotel in Beijing

(Funny translations are not of course confined to East Asia. When I was in Lipari this summer, a sign in my hotel room instructed me to “keep quiet in case of fire.”)

But I think that as interest in China continues to grow in the West, we are going to see more and more incidents like the one suffered by the Max Plank Forschung. (Via Boing Boing.)

In the last post, I stated the following generalization of the weak regularity lemma:

Theorem (Low Complexity Approximation, TTV Version) Let $(X,\mu)$ be a probability space, $g:X \rightarrow [-1,1]$ a bounded function, $F$ a collection of bounded functions $f:X \rightarrow [ -1,1]$, and $\epsilon$ an approximation parameter.

Then there is a function $h: X \rightarrow [-1,1]$ such that

• $h$ has low complexity relative to $F$: there are $k= O( \epsilon^{-2})$ functions $f_i\in F$ and coefficients $c_i$ such that

$\displaystyle h(x) := \max \{ -1 , \min \{ 1, \sum_{i=1}^k c_i f_i (x) \}\}$

• $h$ is $\epsilon$-indistinguishable from $g$ by $F$, that is,

$\forall f.F \ \ \left| {\mathbb E}_{x \sim \mu} f(x) g(x) - {\mathbb E}_{x \sim \mu} f(x) h(x) \right| \leq \epsilon$

(Last time, I mentioned that our proof handled only boolean functions $f$; now we can handle arbitrary bounded functions, and with an “energy-decrease” style proof, this will appear in the next online revision of the paper.)

This seems to be a useful tool, limited only by one’s creativity in choosing the functions $F$ and then making use of the properties of $h$.

• if one takes $X$ to be the edges of a complete graph, and $F$ the set of indicator variables of cuts, then the existence of $h$ gives the weak regularity lemma of Frieze and Kannan; and
• if one takes $F$ to be the set of circuits of size at most $S(n)$, and normalizes $g$ and $h$ to be probability distributions, one gets that for every probability distribution $D$ of high entropy there is a (non-uniformly) efficiently samplable and computable distribution $M$ that is indistinguishable from $D$ by circuits of size $\leq S(n)$.

In this post I’ll show how to use it to give short proofs of the Hardcore Lemma of Impagliazzo and the Dense Model Theorem of Green, Tao and Ziegler. Both proofs also have, at least in hindsight, a sense of “inevitability,” meaning that given the Low-Complexity Approximation Theorem, and given what we want to prove, both proofs get to the point in a most economical and natural way.

1. The Impagliazzo Hardcore Lemma. We have already mentioned that if $g$ is “hard-on-average” for $F$, then $h$ cannot be an approximation in the sense of being close to $g$ on most inputs. What, then, about the points on which $g$ and $h$ differ? They form an Impagliazzo Hardcore Set for $g$, as described next.

Let $g: X \rightarrow \{-1,1\}$ be a function that is weakly hard on average for a class of algorithms $F$. Suppose, specifically, that for every algorithm $h$ of complexity $O(\epsilon^{-2}\delta^{-2})$ relative to $F$ we have

${\mathbb P} [ g(x) = h(x) ] \leq 1-\delta$

and, more generally, for fractional $h$, we have

$(1) \ \ \ {\mathbb E} | g(x) - h(x) | \geq 2\delta$

Then, construct an approximating function $h$ of complexity
$\leq \epsilon^{-2}\delta^{-2}$ relative to $F$ and such that $h$ and $g$ are $\epsilon\delta$-indistinguishable by $F$. Note that, even though $h$ is “indistinguishable” from $g$, it is also “far” from $g$, as in (1).

Define a probability distribution $H$ that assigns to point $x$ a probability proportional to $|g(x)-h(x)|$. (If $h$ were boolean, this would be the uniform distribution over points on which $h$ and $g$ differ.) Then this distribution is $\delta$-dense in the uniform distribution, meaning that every point has probability at most $(\delta |X|)^{-1}$. Observe also that we have

$g(x)\cdot |g(x)-h(x)| = g(x)-h(x)$

for every $x$, because $g(x)$ and $g(x)-h(x)$ have the same sign and $|g(x)|=1$, so we have

$\begin{array}{ll} \displaystyle {\mathbb E}_H g(x)f(x) & \displaystyle = \sum_x \frac {|g(x)-h(x)|}{\sum_y |g(y)-h(y)|} \cdot g(x)f(x) \\ & \displaystyle = \frac {|X|}{\sum_y |g(y)-h(y)|} \cdot {\mathbb E}_X |g(x)-h(x)|g(x)f(x) \\ & \displaystyle \leq \frac {1}{\delta} {\mathbb E}_X (g(x)-h(x))f(x) \\ & \displaystyle \leq \epsilon \end{array}$

and so $H$ is a hardcore distribution, because the above expression is equivalent to

${\mathbb P}_H [ g(x) = f(x) ] \leq \frac 12 + \frac \epsilon 2$

2. The Dense Model Theorem. Suppose that $R\subseteq X$ is a pseudorandom set with respect to functions that have bounded complexity relative to $F$, and let $D\subseteq R$ be a dense subset of $R$, $|D| = \delta |R|$.

To find a dense model of $D$, we take $g$ to be the characteristic function of $D$, and we let $h$ be the low-complexity approximation, but using the uniform distribution on $R$ as $\mu$.. Now suppose for simplicity that $h$ is boolean, and that $M$ is the set of inputs of $X$ on which $h$ is 1. We want to argue that $M$ is a dense model of $D$. By assuming without loss of generality that $F$ contains the all-one function, we get from the indistinguishability of $g$ and $h$ that

$\delta = {\mathbb E}_R g \approx {\mathbb E}_R h$

and from the pseudorandomness of $R$ we have

${\mathbb E}_R h \approx {\mathbb E}_X h = |M|/|X|$

and so $|M| \approx \delta |X|$ and $M$ is indeed dense.

For the indistinguishability of $M$ and $D$, take any function $f$, and observe that

${\mathbb E}_M f = \delta^{-1} {\mathbb E}_R fg \approx \delta^{-1} {\mathbb E}_R fh \approx \delta^{-1} {\mathbb E}_X fh \approx {\mathbb E}_M f$

where we use both the indistinguishability of $g$ and $h$ under distribution $R$, and the fact that the distributions $R$ and $X$ are indistinguishable by functions of bounded complexity.

This proof is appealingly intuitive, in the sense that if we expect $D$ to be indistinguishable from a large set, then when we try to approximate the characteristic function of $D$ we will end up with a low complexity function that is spread around much of $X$, thus defining a dense model. It also shows that “relative” versions of the Regularity Lemma, such as the Regularity Lemma for subgraphs of an expander, may be derived from the regular Lemma by the above argument. A disadvantage of the argument is that it does not establish the stronger form of the Dense Model Theorem suggested by Impagliazzo, in which there is no set $R$, but we require $D$ to have the “pseudo-density” requirement that for every low-complexity bounded function $f$,

${\mathbb E}_{x\in D} |f(x)| \leq \frac 1 {\delta} {\mathbb E}_{X} |f(x)| + \epsilon$

which follows immediately if $D$ has density $\delta$ in a pseudorandom set $R$, but that is a seemingly weaker property. (The relative Regularity Lemma in graphs had long be known to hold under such a pseudo-density assumption.)