You are currently browsing the monthly archive for November 2007.

Terry Tao points to a beautiful article written by Michael Harris for the Princeton Companion to Mathematics, titled *Why Mathematics, You Might Ask*.

The titular question is the point of departure for a fascinating discussion on the foundations of mathematics, on the philosophy of mathematics, on post-modernism, on the “anthropology” approach to social science studies of mathematics, and on what mathematicians think they are doing, and why.

In general, I find articles on philosophical issues in mathematics to be more readable and enlightening when written by mathematicians. Perhaps it’s just that they lack the sophistication of the working philosopher, a sophistication which I mistake for unreadability. But I also find that mathematicians tend to bring up issues that matter more to me.

For example, the metaphysical discussions on the “reality” of mathematical objects and the “truth” of theorems are all well and good, but the really interesting questions seem to be different ones.

The *formalist* view of mathematics, for example, according to which mathematics is the derivation of theorems from axioms via formal proofs, or as Hilbert apparently put it, “a game played according to certain simple rules with meaningless marks on paper,” does not begin to capture what mathematics, just as “writing one sentence after another” does not capture what poetry is. (The analogy is due to Giancarlo Rota.) Indeed one of the main fallacies that follow by taking the extreme formalist position as anything more than a self-deprecating joke is to consider mathematical work as *tautological*. That is, to see a mathematical theorem as *implicit* in the axioms and so its proof as *not a discovery*. (Some of the comments in this thread relate to this point.) Plus, the view does not account for the difference between “recreational” mathematics and “real” mathematics, a difference that I don’t find it easy to explain in a few words, probably because I don’t have a coherent view of what mathematics really *is*.

It’s not quite related, but I am reminded of a conversation I had a long time ago with Professor X about faculty candidate Y.

[Not an actual transcript, but close enough]

X: so what do you think of theory candidate Y?Me: he is not a theory candidate.X: but his results have no conceivable application.Me: there is more to doing theory than proving useless theorems.X: that’s interesting! Tell me more

I enjoyed Harris’s suggestion that “ideas” are the basic units of mathematical work, and his semi-serious discussion of whether ideas “exist” and on their importance.

There are indeed a number of philosophical questions about mathematics that I think are extremely interesting and do not seem to figure prominently in the social studies of mathematics.

For example, and totally randomly:

- When are two proofs essentially the same, and when are they genuinely different?
- What makes a problem
*interesting*? What is the role of*connections*in this determination? - What makes a theorem
*deep*? - What does it mean when mathematicians say that a certain proof
*explains*something, or when they say that it does not?

Different communities have different traditions for terminology. Mathematicians appropriate common words, like ring, field, scheme, ideal,… and the modern usage of the term bears no connection with the everyday meaning of the word. Physicists have a lot of fun with their sparticles and their strangeness and charm and so on. Theoretical computer scientists, like the military, and NASA, prefer acronyms.

We have some isolated examples of felicitous naming. *Expander*, for example, is great: it sounds right and it is suggestive of the technical meaning. *Extractor* is my favorite, combining a suggestion of the meaning with a vaguely threatening sound. I think it’s too bad that *seedless extractor* has come to pass, because it evokes some kind of device to get grape juice. (I was on the losing side that supported *deterministic extractor*.)

Unfortunate namings are of course more common. Not only is the complexity class PP embarrassing to pronounce, but its name, derived from Probabilistic Polynomial time, is a poor description of it. By analogy with #P and $\oplus$P, it should be called MajP.

I heard the story of a famous (and famously argumentative) computer scientist complaining to one of the authors of the PCP theorem about the term PCP, which stands for Probabilistically Checkable Proof. “I too can define a probabilistic checker for SAT certificates,” he supposedly said, “with probability half check the certificate, with probability half accept without looking at it.” The point being that the terminology emphasizes a shortcoming of the construction (the probabilistic verification) instead of the revolutionary feature (the constant query complexity). Personally, I would prefer *Locally Testable Proof*.

Of course we will never change the name of PP or PCP, and the seedless extractors are here to stay, but there is one terminology change for which I’d like to start a campaign.

Naor and Naor constructed in 1990 a pseudorandom generator whose output is secure against linear tests. They called such a generator $\epsilon$*-biased* if the distinguishing probability of every linear test is at most $\epsilon$. Such generators have proved to be extremely useful in a variety of applications, most recently in the Bogdanov-Viola construction of pseudorandom generators again degree-2 polynomials.

Shall we start calling such generators $\epsilon$-*un*biased? Seeing as it is the near lack of bias, rather than its presence, which is the defining feature of such generators?

(I know the reason for the Naor-Naor terminology: *zero-bias* generator makes perfect sense, while *zero-unbiased* makes no sense. But how about the fact that it is technically correct to say that the *uniform* distribution is $\frac {1}{10}$-biased?)

In the Impagliazzo hard-core set theorem we are a given a function such that every algorithm in a certain class makes errors at least a fraction of the times when given a random input. We think of as small, and so of as exhibiting a weak form of average-case complexity. We want to find a large set such that is average-case hard in a stronger sense when restricted to . This stronger form of average-case complexity will be that no efficient algorithm can make noticeably fewer errors while computing on than a trivial algorithm that always outputs the same value regardless of the input. The formal statement of what we are trying to do (see also the discussion in this previous post) is:

Impagliazzo Hard-Core Set Theorem, “Constructive Version”

Let be a boolean function, be a size parameter, be given. Then there is a size parameter such that the following happens.Suppose that for every function computable by a circuit of size we have

Then there is a set such that: (i) is recognizable by circuits of size ; (ii) , and in fact the number of in such that is at least , and so is the number of in such that ; and (iii) for every computable by a circuit of size ,

Our approach will be to look for a “regular partition” of . We shall construct a partition of such that: (i) given , we can efficiently compute what is the block that belongs to; (ii) the number of blocks does not depend on ; (iii) restricted to most blocks behaves like a random function of the same density. (By “density” of a function we mean the fraction of inputs on which the function evaluates to one.)

In particular, we will use the following form of (iii): for almost all the blocks , no algorithm has advantage more than over a constant predictor in computing in .

Let be the union of all majority-0 blocks (that is, of blocks such that takes the value 0 on a majority of elements of ) and let be the union of all majority-1 blocks.

I want to claim that no algorithm can do noticeably better on than the constant algorithm that always outputs 0. Indeed, we know that within (almost) all of the blocks that compose no algorithm can do noticeably better than the always-0 algorithm, so this must be true for a stronger reason for the union. The same is true for , with reference to the constant algorithm that always outputs 1. Also, if the partition is efficiently computable, then(in a non-uniform setting) and are efficiently recognizable. It remains to argue that either or is large and not completely unbalanced.

Recalling that we are in a non-uniform setting (where by “algorithms” we mean “circuits”) and that the partition is efficiently computable, the following is a well defined efficient algorithm for attempting to compute :

Algorithm. Local Majority

On input :

determine the block that belongs to;

output if ;

otherwise output 0

(The majority values of in the various blocks are just a set of bits that can be hard-wired into the circuit.)

We assumed that every efficient algorithm must make at least a fraction of errors. The set of inputs where the Local Majority algorithm makes mistakes is the union, over all blocks , of the “minority inputs” of the block . (If is the majority value of in a block , then the “minority inputs” of are the set of inputs such that .)

Let be the set of minority inputs (those where our algorithm makes a mistake) in and be the set of minority inputs in . Then at least one of and must have size at least , because the size of their union is at least . If has size at least , then has all the properties of the set we are looking for.

It remains to construct the partition. We describe an iterative process to construct it. We begin with the trivial partition where . At a generic step of the construction, we have a partition , and we consider as above. Let be such that . If there is no algorithm that has noticeable advantage in computing over , we are done. Otherwise, if there is such an algorithm , we refine the partition by splitting each block according to the values that takes on the elements of the block.

After steps of this process, the partition has the following form: there are functions and each of the (at most) blocks of the partition corresponds to a bit string and it contains all inputs such that . In particular, the partition is efficiently computable.

We need to argue that this process terminates with . To this end, we define a potential function that measures the “imbalance” of inside the blocks the partition

and we can show that this potential function increases by at least at each step of the iteration. Since the potential function can be at most 1, the bound on the number of iterations follows.

A reader familiar with the proof of the Szemeredi Regularity Lemma will recognize the main ideas of iterative partitioning, of using a “counterexample” to the regularity property required of the final partition to do a refinement step, and of using a potential function argument to bound the number of refinement steps.

In which way can we see them as “finitary ergodic theoretic” techniques? As somebody who does not know anything about ergodic theory, I may not be in an ideal position to answer this question. But this kind of difficulty has not stopped me before, so I may attempt to answer this question in a future post.

The December issue of the Notices of the AMS is now available online, and it includes letters written by Oded Goldreich, Boaz Barak, Jonathan Katz, and Hugo Krawczyk in response to Neal Koblitz’s article which appeared in the September issue.

Despite this, the readers of the Notices remain the losers in this “controversy.” Koblitz’s petty personal attacks and straw man arguments appeared in the same space that is usually reserved, in the Notices, for expository articles and obituaries of mathematicians. It is from those pages that I learned about the Kakeya problem and about the life of Grothendieck (who, I should clarify, is not dead, except perhaps in Erdos’ use of the word).

I find it strange enough that Koblitz would submit his piece to such a venue, but I find it as mind-boggling that the editors would run his piece as if they had commissioned Grothendieck’s biographical article to a disgruntled ex-lover, who would focus most of the article on fabricated claims about his personal hygiene.

I can only hope that the editors will soon run on those pages one or more expository articles on modern cryptography, not as rebuttals to Koblitz’s piece (which has already been discussed more than enough), but as a service to the readers.

And while I am on the subject of Notices article, let me move on to this article on how to write papers.

All beginning graduate students find the process of doing research mystifying, and I do remember feeling that way. (Not that such feelings have changed much in the intervening years.) One begins with a sense of hopelessness, *how am I going to solve a problem that people who know much more than I do and who are smarter than me have not been able to solve?*; then a breakthrough comes, out of nowhere, and one wonders, how is this ever going to happen *again*? Finally it’s time to write up the results, and mathematical proofs definitely don’t write themselves, not to mention coherent and compelling introductory sections. I think it’s great when more experienced scholars take time to write advice pieces that can help students navigate these difficulties. And the number of atrociously badly written papers in circulation suggests that such pieces are good not just for students, but for many other scholars as well.

But I find that advice on “how to publish,” rather than “how to write well” (like advice on “how to get a job” rather than “how to do research”) misses the point (I am thinking of one of the few times I thought Lance Fortnow gave bad advice). For this reason, I found the first section of the Notices article jarring, and the following line (even if it was meant as a joke) made me cringe

I have written more than 150 articles myself. (…) I have never written an article and then been unable to publish it.

I think that this calls for an Umeshism in response.

The Impagliazzo hard-core set theorem is one of the bits of magic of complexity theory. Say you have a function such that every efficient algorithm makes errors at least of the times when computing on a random input. (We’ll think of as exhibiting a weak form of average-case complexity.) Clearly, different algorithms will fail on a different of the inputs, and it seems that, intuitively, there should be functions for which no particular input is harder than any particular other input, per se. It’s just that whenever you try to come up with an algorithm, some set of mistakes, dependent on the algorithmic technique, will arise.

As a good example, think of the process of generating at random, by deciding for every input to set with probability and with probability . (Make the choices independently for different inputs.) With very high probability, every efficient algorithm fails with probability at least about , but, if we look at every efficiently recognizable large set , we see that takes the value 1 on approximately of the elements of , and so the trivial algorithm that always outputs 1 has a pretty good success probability.

Consider, however, the set of size that you get by taking the inputs such that plus a random sample of inputs such that . Then we can see that no efficient algorithm can compute on much better than of the inputs of . This is the highest form of average-case complexity for a boolean function: on such a set no algorithm does much better in computing than an algorithm that makes a random guess.

The Impagliazzo hard-core theorem states that it is always possible to find such a set where the average-case hardness is “concentrated.” Specifically, it states that if every efficient algorithm fails to compute on a fraction of inputs, then there is a set of size such that every efficient algorithm fails to compute on at least a fraction of the elements of . This is true for every , and if “efficient” is quantified as “circuits of size ” in the premise, then “efficient” is quantified as “circuits of size ” in the conclusion.

The example of the biased random function given above implies that, if one wants to prove the theorem for arbitrary , then the set cannot be efficiently computable itself. (The example does not forbid, however, that be efficiently computable given oracle access to , or that a random element of be samplable given a sampler for the distribution for uniform .)

A number of proofs of the hard core theorem are known, and connections have been found with the process of *boosting* in learning theory and with the construction and the decoding of certain error-correcting codes. Here is a precise statement.

**Impagliazzo Hard-Core Set Theorem**

Let be a boolean function, be a size parameter, be given. Then there is a such that the following happens.

Suppose that for every function computable by a circuit of size we have

Then there is a set of size such that for every function computable by a circuit of size we have

Using the “finitary ergodic theoretic” approach of iterative partitioning, we (Omer Reingold, Madhur Tulsiani, Salil Vadhan and I) are able to prove the following variant.

**Impagliazzo Hard-Core Set Theorem, “Constructive Version”**

Let be a boolean function, be a size parameter, be given. Then there is a such that the following happens.

Suppose that for every function computable by a circuit of size we have

Then there is a set such that: (i) is recognizable by circuits of size ; (ii) , and in fact the number of in such that is at least , and so is the number of in such that ; and (iii) for every computable by a circuit of size ,

The difference is that is now an efficiently recognizable set (which is good), but we are not able to derive the same strong average-case complexity of in (which, as discussed as the beginning, is impossible in general). Instead of proving that a “random guess algorithm” is near-optimal on , we prove that a “fixed answer algorithm” is near-optimal on . That is, instead of saying that no algorithm can do better than a random guess, we say that no algorithm can do better than either always outputting 0 or always outputting 1. Note that this conclusion is meaningless if is, say, always equal to 1 on , but in our construction we have that is not exceedingly biased on , and if , say, then the conclusion is quite non-trivial.

One can also find a set with the same type of average-case complexity as in the original Impagliazzo result ~~by putting into a size sample of elements of such that and an equal size sample of elements of such that equals 1. (Alternatively, put in all the elements of on which achieves the minority value of in , then add a random sample of as many elements achieving the majority value.)~~ Then we recover the original statement except that is exponential instead of polynomial. [Update: constructing is somewhat more complicated than we originally thought, the details are in the paper.]

Coming up next, the proof of the “constructive hard core set theorem” and my attempt at explaining what the techniques have to do with “finitary ergodic theory.”

An amazing video to Daft Punk’s *Harder, Better, Faster, Stronger*

Don’t be discouraged by the slow first minute; it does get better, faster, and harder.

Doing the same with a different Daft Punk song, however, can be less impressive.

We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length , choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size .

(Technically, the theorem states the existence of a distribution of min-entropy , but one can also get the above statement by standard “rounding” techniques.)

As a slightly more general example, if you have a generator mapping a length- seed into an output of length , and is a distribution of seeds of min-entropy at least , then is indistinguishable from a distribution of min-entropy . (This, however, works only if .)

It’s time to give a formal statement. Recall that we say that a distribution is -dense in a distribution if

(Of course I should say “random variable” instead of “distribution,” or write things differently, but we are between friends here.)

We want to say that if is a class of tests, is pseudorandom according to a moderately larger class , and is -dense in , then there is a distribution that is indistinguishable from according to and that is -dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

Readers should take a moment to convince themselves that the above statement is indeed saying that if is pseudorandom then has a model , by equivalently saying that if no model exists then is not pseudorandom.

The problem with the above statement is that can be arbitrary and, in particular, it can have circuit complexity exponential in , and hence in .

In our proof, instead, is a linear threshold function, realizable by a size circuit. Another improvement is that .

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume is closed under complement (otherwise work with the closure of ), then the assumption of the theorem can be restated without absolute values

for every that is -dense in there is an such that

We begin by finding a “universal distinguisher.”

Claim

There is a function which is a convex combination of functions from and such that that for every that is -dense in ,

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now be the set of elements of where is largest. We must have

(1)

which implies that there must be a threshold such that

(2)

So we have found a boolean distinguisher between and . Next,

we claim that the same distinguisher works between and .

By the density assumption, we have

and since contains exactly a fraction of , and since the condition always fails outside of (why?), we then have

and so

(3)

Now, it’s not clear what the complexity of is: it could be a convex combination involving *all* the functions in . However, by Chernoff bounds, there must be functions with such that is well approximated by for all $x$ but for an exceptional set having density less that, say, , according to both and .

Now and are distinguished by the predicate , which is just a linear threshold function applied to a small set of functions from , as promised.

Actually I have skipped an important step: outside of the exceptional set, is going to be *close* to but not identical, and this could lead to problems. For example, in (3) might typically be larger than only by a tiny amount, and might consistently underestimate in . If so, could be a completely different quantity from .

To remedy this problem, we note that, from (1), we can also derive the more “robust” distinguishing statement

(2′)

from which we get

(3′)

And now we can be confident that even replacing with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like to contain bounded functions rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like to be a function of the form rather than a general composition of functions . This we can achieve by approximating a threshold function by a polynomial of degree using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)

Let be a finite set, be a class of functions , be a distribution over , be a -dense distribution in , be given.Suppose that for every that is -dense in there is an such that

Then there is a function of the form where and such that

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

## Recent Comments