The Swift-Boating of Modern Cryptography

The September issue of the Notices of the AMS is out, and it contains an article by Neal Koblitz on modern cryptography, exposing themes he wrote about in previous articles.

Before I get to Koblitz’s thesis I should describe the context, as I see it.

Cryptography underwent two major revolutions in the 1970s and 1980s.

The notion of public key cryptography, invented by Diffie and Hellman (and earlier, but only in classified documents that didn’t enter the public domain for decades, by Ellis, Cocks, and Williamson) and made possible by Rivest, Shamir and Adleman, allowed parties that had never met in advance to share a secret key to communicate over an unsafe channel. Without this technology, buying and selling things online would be extremely inconvenient and companies like amazon and ebay would probably not exist.

The other revolution, started by the 1982 series of papers by Blum, Goldwasser, Micali and Yao, was the discovery that one could provide formal definitions of security for cryptographic problems, and that such definitions were achievable under complexity assumptions, albeit, initially, via slow constructions.

Indeed, the importance of the new definitions cannot be overstated, and, possibly for lack of accessible expositions, it has not been completely digested by all the security community. I remember, not too long ago, reading a paper on electronic elections, listing seven or more requirements that an election protocol should satisfy, and it was clear that the list was unwieldy, redundant, and, most importantly, incomplete. The modern definitional approach is instead to begin with a description of an ideal setting (where every person votes in a secret ballot, then all ballots are counted and the total tally is the only information that anybody gets) and then require that a protocol be such that anything an adversary can do in the protocol to alter the outcome or gain information, can also be done in the ideal setting. In particular, whatever outcome or information an attacker cannot gain in the ideal setting, it cannot be gained in the protocol either.

Some constructions developed by theoreticians and coming with a formal analysis are too inefficient to be used, but their development often leads to the discovery of general design principles such that, for example, public key encryption algorithms must be randomized, and should be designed so that it is not possible to construct a valid ciphertext in any other way than applying the encryption algorithm to a known message.

Indeed modern cryptography is the area of computer science where the distance between theory and practice is the least: one finds theoreticians who spend most of their time on impractical constructions designed to be “plausibility results” but who also sit on standards bodies and help create and assess widely used systems, whereas one is less likely to see the algorist preoccupied with $O(log n)$ approximation algorithms also working on commercial optimization packages. The important difference is that optimization algorithms can be validated in practice in a way that is impossible for cryptographic protocols, where one cannot anticipate what attacks will be used, and hence one should strive for security against all possible attacks which is possible, within an attack model, only via a formal analysis and reductions.

Koblitz points out that sometimes proofs contain mistakes, and that there can be attacks not covered by standard models. His reaction, however, is not that the community should be very careful about formal correctness, and explore (as is being done) new models that take into account timing attacks and other “grey box” accesses to the computations of the parties. Rather, he suggests doing away with proofs, and relying more on intuition. This is discussed in the second part of the paper (the first part is devoted to encryption schemes and factoring algorithms via elliptic curves, the “good” interaction between math and cryptography), through such rhetorical devices as non sequiturs, personal attacks, and petulance. The CRYPTO community’s typesetting abilities are not spared, nor is Oded Goldreich’s spelling.

It would seem hard to defend the idea that one is more likely to make a correct statement if the statement has no proof compared to having a proof, or that one can be secure against a wider class of attacks by relying on intuition rather than defining a class of attacks and establishing the security guarantee. It would be like having a draft-dodger compete in an election against a war hero, and having the latter be on the defensive about his military service, but sometimes strange things do happen.

(For the Italian readers, a better reference would be from Nanni Moretti’s Aprile: “Di qualcosa, D’Alema rispondi. Non ti far mettere in mezzo sulla giustizia proprio da Berlusconi! D’Alema, dì una cosa di sinistra, dì una cosa anche non di sinistra, di civiltà!”)

Update 9/1/07: you can now read letters to the editors by Oded Goldreich, Boaz Barak, and Jon Katz, and there are more online comments [here] and [here]

Update 9/5/07: Hugo Krawczyk has also written a letter to the editors of the Notices. The interested reader can compare what Koblitz said about Hugo’s work on HMQV to the actual HMQV paper.

More ways to prove unsatisfiability of random k-SAT

Earlier, here and here, we discussed the following problem: we pick at random a k-CNF formula with $n$ variables and $m$ clauses; if $m$ is at least $c_k n$, for a constant $c_k$, then we know that with high probability the formula is unsatisfiable; is there an algorithmic way of certifying this unsatisfiability?

One approach we discussed is a reduction to 2SAT, which works provided $m$ is at least of the order of $n^{k-1}$. What about sparser formulas?

Here is another possible reduction. Starting from the formula, construct an hypergraph that has 2n vertices and m hyperedges as follows. For every variable $x_i$ we have the two vertices $[x_i=0]$ and $[x_i=1]$, and, for every clause with $k$ variables, we have the hyperedge that connects the $k$ vertices corresponding to the unique assignment to those $k$ variables that contradicts the clause. For example, the clause
$(x_3 \vee \neg x_5 \vee x_6)$
corresponds to the hyperedge
$([x_3=0],[x_5=1],[x_6=0])$.

Now, if the formula is random, we have a random hypergraph. Also, if the formula is satisfiable we have an independent set of size $n$; half as big as the total number of vertices: just take the vertices consistent with the assignment. (An independent set is a set of vertices such that no hyperedge is completely contained in the set.) Certifying unsatisfiability of a random formula now reduces to certifying that a given random hypergraph has no large independent set.

Unfortunately, we don’t know of any algorithm for this problem, except for the case of graphs. As I may discuss in a future post, spectral methods allow us to certify that a given random graph with $n$ vertices and average degree $d$ has no independent set larger than about $n/O(\sqrt{d})$. By this, I mean that there is a definition of certificate that, when existing, always correctly proves such an upper bound, and when we pick at random a graph of average degree $d$ there is a high probability that a certificate proving an upper bound of $n/O(sqrt{d})$ to the size of the largest independent set exists and can be found efficiently.

It is too bad that the above reduction produces a graph only when we start from 2SAT, a problem for which we already know how to decide (and hence certify) satisfiability in polynomial time.

But, and here is a great idea of Goerdt and Krivelevich from 2000, we can reduce the problem of certifying non-existence of large independent sets in random hypergraps to the problem of certifying non-existence of large independent sets in random graphs.

Suppose we have an hypergraph with n vertices and m hyperedges, each involving 4 vertices. Construct now a graph with $n^2$ vertices, one for every pairs of vertices in the hypergraph, and for every hyperedge $(a,b,c,d)$ in the hypergraph create the edge $([a,b],[c,d])$ in the graph. (Assume for now that we choose the ordering of vertices at random, even if this means that we only achieve a randomized reduction. There are ways to make the reduction deterministic.) Now, if the hypergraph has an independent set of size $\geq t$ then clearly the graph has an independent set of size $\geq t^2$. Furthermore, if we started from a random hypergraph then we are getting a random graph. So if $m$ is of the order of $n^2$ we are able to refute the claim that there is an independent set of size $n/2$ in the hypergraph (by refuting the claim that there is an independent of size $n^2 /4$ in the graph).

In general, for even $k$, these ideas give a way of refuting a random $k$-SAT instance with $n$ variables and $n^{k/2}$ clauses.

(The original paper of Goerdt and Kirvelevich had an extra polylog term needed to make the spectral techniques work. But later more sophisticated analyses have removed the polylog bound, either by using slightly different reductions or by directly improving the bounds on the sparsity of random graphs for which one can certify the non-existence of large independent sets. See this paper by Feige and Ofek for the latter approach.)

Instead of thinking in terms of reductions to graph and hypergraph problems, one may directly see the method as associating a matrix to the formula and proving that certain properties of the matrix imply the unsatisfiability of the formula.

A generalization of this way of seeing the argument has led to an algorithm by Friedman, Goerdt and Krivelevich that certifies unsatisfiability of random kSAT instances with about $n^{k/2}$ clauses even if $k$ is odd. I think it would be interesting to have a combinatorial view of what happens in that algorithm.

This is the state of the art for algorithms that find certificates of unsatisfiability.

There is also some intuition for why $n^{1.5}$ is a natural barrier. The algorithmic techniques described so far are “no more powerful” than semidefinite programming: the standard semidefinite relaxation of Max 2SAT proves that a given 2SAT formula is unsatisfiable, whenever it is the case, and a standard semidefinite programming relaxation of independent set (the Lovasz theta function) proves with high probability that a random graph has no large independent set. It is conjectured, however, that no “simple” reduction of random 3SAT to a semidefinite programming problem yelds a refutation if the number of clauses is less than $n^{1.5}$. This has been verified by Feige and Ofek for a natural reduction.

Recently, Feige, Kim and Ofek have defined a new type of witness of unsatisfiability that is verifiable in polynomial time and that exists with high probability for formulas with about $n^{1.4}$ clauses. (It is not known, however, how to construct such witnesses in polynomial time given a formula.) As could be expected, their witness-verification algorithm employs something that we know how to do in polynomial time but that is very hard for semidefinite programs: verifying the unsatisfiability of a given linear system over GF(2).

Lies, Damn Lies, and the Number of Sexual Partners

A few days ago, Gina Kolata reported in the New York Times on the paradox of studies on sexual behavior consistently reporting (heterosexual) men having more sexual partners than women, with a recent US study reporting men having a median number of 7 partners and women a median number of 4. Contrary to what’s stated in the paper, this is not mathematically impossible (key word: median). It is however quite implausible, requiring a relatively small number of women to account for a large fraction of all men’s partners.

An answer to this paradox can be found in Truth and consequences: using the bogus pipeline to examine sex differences in self-reported sexuality, by Michele Alexander and Terry Fisher, Jorunal of Sex Research 40(1), February 2003.

In their study, a sample of men and women are each divided into three groups and asked to fill a survey on sexual behavior. People in one group filled the survey alone in a room with an open door, a researcher sitting outside, and after being told the study was not anonymous; people in a second group filled the survey in a room with a closed door and an explicit assurance of anonymity; people in a third group filled the survey attached to what they believe to be a working “lie detector.”

In the first group, women reported on average 2.6 partners, men 3.7. In the second group, it was women 3.4 and men 4.2. In the third group, it was women 4.4 and men 4.0.

(The study looks at several other quantities, and some of them have even wider variance in the three settings.)

So, not surprisingly given the sexual double standards in our culture, men and women lie about their sexual behavior (men overstate, women understate), and do less so in an anonymous setting or when the lie is likely to be discovered.

Here is the reporting of the first group put to music:

[Update 8/18/07: so many people must have emailed her about the median versus average issue in the article that Gina Kolata wrote a clarification. Strangely, she does not explain, for the rest of the readers, what the difference is and why it is possible, if unlikely, to have very different medians for men and women. The claim in the clarification, by the way, is still wrong: those 9.4% of women with 15 or more partners could be accounting for all the missing sex.]

Atle Selberg

Number theorist Atle Selberg died in Princeton on Monday, at age 90. One of his major results was the “Selberg formula” that led to an elementary proof of the prime number theorem, a Fields medal in 1950, and a famous controversy with Paul Erdos.