Expert, Shmexpert

About a week ago, the email server used by climate scientists at the University of East Anglia was hacked, and emails dating as far back as 1996 were posted online.

National Review, the conservative magazine, “asked its environmentalism experts to weigh in” on the implications of the leak. One of the experts is Henry Payne, billed as “an editorial writer and cartoonist with the Detroit News.” According to wikipedia, Mr. Payne is a history major.

As a computer scientist and Chinese food enthusiast, I believe I am more qualified than Mr. Payne to discuss climate change, and I hope National Review will consider asking me to write a piece on the subject next time.

Update: in the next episode of the series everybody is an expert, Sarah Palin writes an op-ed on climate change for the Washington Post. “I’ve always believed that policy should be based on sound science, not politics,” writes Sarah Palin. I am looking forward to zombie Ronald Reagan‘s take on the matter.

The Money Troubles of the University of California

Last week, at BATS, a colleague from a research lab was asking me what exactly is going on with the budget of the University of California. The conversation went like this:

“How much does the state contribute to the UC Budget?”

“About a third”

“And how much has the state contribution decreased this year”

“About 20%”

“So that’s about 6% of the total budget. How come they need to cut salaries by 8% and increase tuition by 32% in order to recover from a 6% budget cut?”

How come, indeed? Over the weekend I have tried to make sense of this question, but unfortunately it is very difficult to make sense of the UC budget, not to mention that the numbers in the 2008-09 budget document from the office of the president are quite different from the number in the 2008-09 audit.

Here is, however, what I have understood so far:
Continue reading

The Berkeley Way

On Thursday, the Regents of the University of California met at UCLA and voted to increase student fees (tuition) by about 32%. There will be a 15% increase this Spring, and another 15% increase in Fall 2010.

Friday, while the theoreticians where at the Bay Area Theory Symposium, was a day of protest on campus, which showed the best and the worst of Berkeley.

Protesters went around campus buildings pulling fire alarms (bad), resulting in my colleague David Tse teaching probability and combinatorics outside in the rain (very good — or as one of the youtube commenters put it, “epic WIN”):

A group of about 40 students tried to occupy some rooms of Wheeler hall, the English department building, and up to 2,000 students showed up outside the building on a cold (by California standards) and rainy day (very good).

A few days of occupation at Wheeler would have probably received ample news coverage and would have publicized the fundamental fact of the present situation: that the University of California, and U.C. Berkeley in particular, will lose its character of public university if the funding crisis persists, and that the budget cuts are not free money for the tax-payers; they will have real negative consequences of hundreds of thousands of middle-class families.

Yet, the administration (very bad) instead of supporting the students sent the campus police first, and then police in riot gear from the Berkeley police department and the Alameda county’s Sheriff’s force (and possibly Oakland police, whose presence is disputed) to arrest the would-be occupiers and beat up the crowd outside Wheeler.

It’s anybody’s guess what the Chancellor might have been thinking.

The Large Deviation of Fourwise Independent Random Variables

Suppose {X_1,\ldots,X_n} are mutually independent unbiased {\pm 1} random variables. Then we know everything about the distribution of

\displaystyle   | X_1 + \ldots + X_N | \ \ \ \ \ (1)

either by using the central limit theorem or by doing calculations by hand using binomial coefficients and Stirling’s approximation. In particular, we know that (1) takes the values {1,\ldots, 1/\sqrt N} with probability {\Theta(1/\sqrt N)} each, and so with constant probability (1) is at most {O(\sqrt N)}.

The last statement can be proved from scratch using only pairwise independence. We compute

\displaystyle  \mathop{\mathbb E} \left| \sum_i X_i \right|^2 = N

so that

\displaystyle  \mathop{\mathbb P} \left[ \left|\sum_i X_i \right| \geq c \cdot \sqrt N \right] = \mathop{\mathbb P} \left[ \left|\sum_i X_i \right|^2 \geq c^2 \cdot N \right] \leq \frac 1 {c^2}

It is also true that (1) is at least {\Omega(\sqrt N)} with constant probability, and this is trickier to prove.

First of all, note that a proof based on pairwise independence is not possible any more. If {(X_1,\ldots,X_N)} is a random row of an Hadamard matrix, then {\sum_i X_i = N} with probability {1/N}, and {\sum_i X_i =0} with probability {1-1/N}.

Happily, four-wise independence suffices.

Continue reading

Distinguishers from linear functions

In the last post we introduced the following problem: we are given a length-increasing function, the hardest case being a function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n} whose output is one bit longer than the input, and we want to construct a generator {D} such that the advantage or distinguishing probability of {D}

\displaystyle   \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \ \ \ \ \ (1)

is as large as possible relative to the circuit complexity of {D}.

I will show how to achieve advantage {\epsilon} with a circuit of size {O(\epsilon^2 n 2^n)}. Getting rid of the suboptimal factor of {n} is a bit more complicated. These results are in this paper.

Continue reading

Efficiently Correlating with a Real-valued Function and Breaking PRGs

Suppose we have a length-increasing function {G: \{ 0,1 \}^{n-1} \rightarrow \{ 0,1 \}^n}, which we think of as a pseudorandom generator mapping a shorter seed into a longer output.

Then the distribution of {G(z)} for a random seed {z} is not uniform (in particular, it is concentrated on at most {2^{n-1}} of the {2^n} elements of {\{ 0,1 \}^n}). We say that a statistical test {D: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} has advantage {\epsilon} in distinguishing the output of {G} from the uniform distribution if

\displaystyle   \left| \mathop{\mathbb P}_{z \in \{ 0,1 \}^{n-1}} [D(G(z)) =1 ] - \mathop{\mathbb P}_{x \in \{ 0,1 \}^{n}} [D(x) =1 ] \right| \geq \epsilon \ \ \ \ \ (1)

If the left-hand side of (1) is at most {\epsilon} for every {D} computable by a circuit of size {S}, then we say that {G} is {\epsilon}-pseudorandom against circuits of size {S}, or that it is an {(S,\epsilon)}-secure pseudorandom generator.

How secure can a pseudorandom generator possibly be? This question (if we make no assumption on the efficiency of {G}) is related to the question in the previous post on approximating a boolean function via small circuits. Both questions, in fact, are special cases of the question of how much an arbitrary real-valued function must correlate with functions computed by small circuits, which is answered in a new paper with Anindya De and Madhur Tulsiani.

Continue reading

Approximating a Boolean Function via Small Circuits

Suppose that {f: \{ 0,1 \}^n \rightarrow \{ 0,1 \}} is an arbitrary boolean function, and that we want to construct (or rather, show the existence of) a circuit {C} of size {\leq S} that approximates {f} as well as possible. How well can we hope to do?

Certainly, we can get a {O(1)}-size circuit {C} that computes {f} on {\geq \frac 12} of the inputs, by either choosing the circuit that always outputs 0 or the circuit that always outputs 1.

Also (ignoring {n^{O(1)}} factors), we can construct a circuit that has hardwired the values of {f} at {|S|} places, and then outputs always zero or always one (whichever is better) on the remaining inputs. This circuit will work on a

\displaystyle   \frac 12 + \frac {S}{2^n} \ \ \ \ \ (1)

fraction of inputs, and this seems like the best we can hope for.

We may then try to prove the optimality of the above construction by considering a random function {f}, using the Chernoff bound to show that, for a fixed circuit {C} of size {S}, {f} and {C} have agreement less than {1/2 + \epsilon} except with probability less than {e^{-\epsilon^2 2^n}}, and then taking a union bound over the roughly {2^S} circuits of size {S}. (It’s actually {2^{O(S \log S)}}, but we said we would ignore {n^{O(1)}}.) Unfortunately, this calculation only leads us to a proof that, in general, we cannot hope to compute a random function on more than a

\displaystyle  \frac 12 + \sqrt{\frac{S}{2^n}}  \ \ \ \ \ (2)

fraction of inputs using a circuit of size {S}. This is a disappointment because we were hoping to prove that (1) was tight.

If we think about it, however, we see that the bound in (2) is actually typically achievable for a random function: partition the {2^n} possible inputs into {S} blocks, each of size {2^n/S}. (Based, for example, on the value of the first {\log S} bits.) Now, the value of {f} in each block are just a sequence of {2^n/S} random bits, so, with constant probability, there is either an excess of {> \sqrt{2^n/S}} ones or an excess of {> \sqrt {2^n/S}} zeroes. For each block, hard-wire into the circuit the majority value of {f} among elements of the block. Clearly we are getting an {1/2 + \Omega(\sqrt{S/2^n})} fraction of inputs correct, and our circuit has size {S}.

What about general functions? Is it possible to do better than (1) for every function?

Continue reading