You are currently browsing the category archive for the ‘theory’ category.
Having (non-rigorously) defined the Laplacian operator in manifolds in the previous post, we turn to the proof of the Cheeger inequality in manifolds, which we restate below.
Theorem 1 (Cheeger’s inequality) Let
be an
-dimensional smooth, compact, Riemann manifold without boundary with metric
, let
be the Laplace-Beltrami operator on
, let
be the eigenvalues of
, and define the Cheeger constant of
to be
where the
is the boundary of
,
is the
-dimensional measure, and
is
-th dimensional measure defined using
. Then
We begin by recalling the proof of the analogous result in graphs, and then we will repeat the same steps in the context of manifolds.
Theorem 2 (Cheeger’s inequality in graphs) Let
be a
-regular graph,
be its adjacency matrix,
be its normalized Laplacian matrix,
be the eigenvalues of
, and define
for every subset of vertices
. Define the conductance of
as
where
is the number of edges with one endpoint in
and one endpoint in
. Then
1. Proof the Cheeger inequality in graphs
We will use the variational characterization of the eigenvalues of the Laplacian of a graph
.
is a minimizer in the above expression then
Following the definition of we see that
and so the minimum in (3) is 0, and it is achieved for . This means that
The expression in the right-hand-side of (4) is an important one, and it is called the Rayleigh quotient of , which we will denote by
:
It is also useful to consider the variant of the Rayleigh quotient where there are no squares; this does not have a standard name, so let us call it the Rayleigh quotient and denote it by
:
The proof of the graph Cheeger inequality now continues with the proof of the following three facts.
Lemma 3 (Rounding of
embeddings) For every non-negative vector
there is a value
such that
Lemma 4 (Embedding of
into
) For every non-negative vector
, we have
Lemma 5 (From an eigenvector to a non-negative vector) For every
such that
there is a non-negative
such that
and such that
Now let us start from a function that optimizes (4), so that
and
, then apply Lemma 5 to find a function
such that the volume of the vertices having positive coordinates in
is at most
and such that
. Then consider the vector
such that
; by Lemma 4, we have
, and by Lemma 3 there is a threshold
such that the set
has conductance
. Since
is a subset of the vertices having positive coordinates in
, we have
, and so
which is the Cheeger inequality for graphs. It remains to prove the three lemmas.
Proof: of Lemma 3. For each threshold , define the set
The idea of the proof is that if we pick at random then the probability that an edge belongs to
is proportional to
and the probability that
is proportional to
, so that the expected number of edges in
is proportional to the numerator of
and the expected number of vertices in
is proportional to the denominator of
; if
is small, it is not possible for
to always be large for every
.
To avoid having to normalize the range of to be between
and
, instead of taking averages over a random choice of
, we will consider the integral over all values of
. We have
because we can write , where
if
and
otherwise, and we see that only the values of
between
and
make
, so
.
We also have
and if we denote by the threshold such that
is smallest among all the
, then
so that
Proof: of Lemma 3. Let us consider the numerator of ; it is:
(we used Cauchy-Swarz)
(we used the definition of and Cauchy-Swarz again)
And so
Proof: of Lemma 5. Let be the median of
, and consider
defined as
. We have
because the numerators of and
are the same (the additive term
cancels). The denominators are such that
because and the vector
are orthogonal, and so by Pythagoras’s theorem the length-squared of
equals the length-squared of
plus the length-squared of
.
Let us define and
so that
. We use the following fact:
Fact 6 Let
be disjointly supported non-negative vectors (“disjointly supported” means that they are non-zero on disjoint subsets of coordinates), then
Proof: The numerator of is
and, using orthogonality and Pythagoras’s theorem, the denominator of is
The fact now follows from the inequality
The lemma now follows by observing that and
are non-negative and disjointly supported, so
and that both and
have at most
non-zero coordinate.
2. Proof of the Cheeger inequality in manifolds
We will now translate the proof of the graph Cheeger inequality to the setting of manifolds.
As you may remember, we started off by saying that is symmetric and so all its eigenvalues are real and they are given by the variational characterization. Now we are already in trouble because the operator
on manifolds cannot be thought of as a matrix, so what does it mean for it to be symmetric? The consequence of symmetry that is exploited in the analysis of the spectrum of symmetric matrices is the fact that if
is symmetric, then for every
we have
and the property makes no references to coordinates, and it is well defined even for linear operators over infinite-dimensional spaces, provided that there is a notion of inner product. If we the define the inner product
on functions , and more generally
for functions , where
is a vector space with inner product
, then we can say that an operator
is self-adjoint if
for all (appropriately restricted) functions . If
is compact, this property is true for the Laplacian, and, in particular,
and
are adjoints of each others, that is,
(The discrete analog would be that is the transpose of
.)
Self-adjointness (and appropriate conditions on ) imply a version of the spectral theorem and of the variational characterization. In particular, all eigenvalues of
are real, and if there is a minimum one then it is
and if is a minimizer of the above, then
(The minimization is quantified over all functions that are square-integrable, and the minimum is achieved because if is compact then the space of such functions is also compact and the cost function that we are minimizing is continuous. In this post, whenever we talk about “all functions,” it should be understood that we are restricting to whatever space of functions makes sense in the context.)
From the property that and
are adjoint, we have
so
where the Rayleigh quotient
is always non-negative, and it is zero for constant , so we see that
and
By analogy with the graph case, we define the “ Rayleigh quotient”
And we can prove the analogs of the lemmas that we proved for graphs.
Lemma 7 (Rounding of
embeddings) For every non-negative function
there is a value
such that
where the Cheeger constant of a subset
of the manifold is
Lemma 8 (Embedding of
into
) For every non-negative function
, we have
Lemma 9 (From an eigenfunction to a non-negative function) For every function
such that
there is a non-negative
such that
and such that
Let us see the proof of these lemmas.
Proof: of Lemma 7. For each threshold , define the set
Let be a threshold for which
is minimized
We will integrate the numerator and denominator of over all
. The coarea formula for nonnegative functions is
and we also have
which combine to
so that
Proof: of Lemma 8. Let us consider the numerator of ; it is:
We can apply the chain rule, and see that
which implies
and, after applying Caucy-Swarz,
And so
Proof: of Lemma 9. Let be a median of
, and consider
defined as
. We have
because the numerators of and
are the same (the derivatives of functions that differ by a constant are identical) and the denominators are such that
where we used the fact the integral of is zero.
Let us define and
so that
. We use the following fact:
Fact 10 Let
be disjointly supported non-negative functions (“disjointly supported” means that they are non-zero on disjoint subsets of inputs), then
Proof: We begin with the following observation: if is a non-negative function, and
, then
, because
has to be a local minimum.
Consider the expression occurring in the numerator of
. We have
But
because for every at least one of
or
is zero, and so at least one of
or
is zero.
Using this fact, we have that the numerator of is equal to the sum of the numerators of
and
:
and the denominator of is also the sum of the denominators of
and
:
because for every
. The fact now follows from the inequality
The lemma now follows by observing that and
are non-negative and disjointly supported, so
and that both and
have a support of volume at most
.
If anybody is still reading, it is worth observing a couple of differences between the discrete proof and the continuous proof.
The Rayleigh quotient is defined slightly differently in the continuous case. It would correspond to defining it as
in the discrete case.
If are disjointly supported and nonnegative, the sum of the numerators of the Rayleigh quotients
and
can be strictly smaller than the numerator of
, while we always have equality in the continuous case. In the discrete case, the sum of the numerators of
and
can be up to twice the numerator of
(this fact is useful, but it did not come up in this proof), while again we have exact equality in the continuous case.
The chain rule calculation
corresponds to the step
In the continuous case, and
are “infinitesimally close”, so we can approximate
by
.
The foundations of cryptography were laid down in 1982, the annus mirabilis that saw the publications of the work of Blum and Micali on pseudorandom generators, of Goldwasser and Micali on rigorous definitions of encryption, and of Yao, who gave a more general definitional approach. The paper of Shafi Goldwasser and Silvio Micali, in particular, introduced the incredibly influential concept of indistinguishability of distributions, and the idea of defining security in terms of simulation of an ideal model in which the security requirements are self-evident. (For example, because in the ideal model an adversary is not able to access the channel that we use to send encrypted data.) Almost every definition of security in cryptography follows the simulation approach, which also guides proofs of security. Shafi and Silvio both went on to do foundational work in cryptography, complexity theory, and algorithms, including their work on zero knowledge, secure multiparty computation, and property testing.
So it was with much joy, this early morning in Japan, that I heard the news that Shafi Goldwasser and Silvio Micali have been named as recipients of this year’s Turing award.
Omer Reingold has more information about their work. With no offense to colleagues around my age and younger, Shafi and Silvio are also representative of a time when leading theoretical computer scientists were more interesting people. They both have incredible charisma.
My favorite memory of Shafi and Silvio is from the time I interviewed for a faculty job at MIT. Shafi was in the last weeks of her pregnancy and did not make an appointment to see me, but then the day of my interview she changed her mind and showed up in Silvio’s office halfway through my meeting with him.
Silvio had been looking at my schedule and was giving me advice on how to talk to various people. Shafi asked what we were talking about, and then proceeded to give the opposite advice that Silvio had been giving me. The two of them spent the rest of the meeting arguing with each other.
Oh man, not another election! Why do we have to choose our leaders? Isn’t that what we have the Supreme Court for?
– Homer Simpson
Nate Silver is now putting Barak Obama’s chance of reelection at around 85%, and he has been on the receiving end of considerable criticism from supporters of Mitt Romney. Some have criticized his statistical analysis by pointing out that he has a soft voice and he is not fat (wait, what? read for yourself – presumably the point is that Silver is gay and that gay people cannot be trusted with such manly pursuits as statistics), but the main point seems to be: if Romney wins the election then Silver and his models are completely discredited. (E.g. here.) This is like someone saying that a die has approximately a 83% probability of not turning a 2, and others saying, if I roll a die and it turns a 2, this whole “probability” thing that you speak of is discredited.
But still, when someone offers predictions in terms of probability, rather than simply stating that a certain outcome is more likely, how can we evaluate the quality of such predictions?
In the following let us assume that we have a sequence of binary events, and that each event has a probability
of occurring as a
and
of occurring as
. A predictor gives out predicted probabilities
, and then events
happen. Now what? How would we score the predictions? Equivalently, how would we fairly compensate the predictor?
A simple way to “score” the prediction is to say that for each event we have a “penalty” that is , or a score that is
. For example, the prediction that the correct event happens with 100% probability gets a score of 1, but the prediction that the correct event happens with 85% probability gets a score of .85.
Unfortunately this scoring system is not “truthful,” that is, it does not encourage the predictor to tell us the true probabilities. For example suppose that a predictor has computed the probability of an event as 85% and is very confident in the accuracy of the model. Then, if he publishes the accurate prediction he is going to get a score of .85 with probability .85 and a score .15 with probability .15. So he is worse off than if he had published the prediction of the event happening with probability 100%, in which case the expected score is .85. In general, the scheme makes it always advantageous to round the probability to 0% or 100%.
Is there a truthful scoring system? I am not sure what the answer is.
If one is scoring multiple predictions of independent events, one can look at all the cases in which the prediction was, say, in the range of 80% to 90%, and see if indeed the event happened, say, a fraction between 75% and 95% of the times, and so on.
One disadvantage of this approach is that it seems to require a discretization of the probabilities, which seems like an arbitrary choice and one that could affect the final score quite substantially. Is there a more elegant way to score multiple independent events without resorting to discretization? Can it be proved to be truthful?
Another observation is that such an approach is still not entirely truthful if it is applied to events that happen sequentially. Indeed, suppose that we have a series of, say, 10 events for which we predicted a 60% probability of a 1, and the event 1 happened 7 out of 10 times. Now we have to make a prediction of a new event, for which our model predicts a 10% probability. We may then want to publish a 60% prediction, because this will help even out the “bucket” of 60% predictions.
I don’t think that there is any way around the previous problem, though it seems clear that it would affect only a small fraction of the predictions. (The complexity theorists among the readers may remember similar ideas being used in a paper of Feigenbaum and Fortnow.)
Surely the task of scoring predictions must have been studied in countless papers, and the answers to the above questions must be well known, although I am not sure what are the right keywords to use to search for such work. In computer science, there are a lot of interesting results about using expert advice, but they are all concerned with how you score your own way of picking which expert to trust rather than the experts themselves. (This means that the predictions of the experts are not affected by the scoring system, unlike the setting discussed in this post.)
Please contribute ideas and references in the comments.
Last Fall, three Stanford classes were “offered online” for free: Andrew Ng’s machine learning class, Sebastian Thrun’s AI class, and Jennifer Widom’s data base class. There had been interest and experiments in online free education for a long time, with the MITx initiative being a particularly significant one, but there were a few innovations in last year’s Stanford classes, and they probably contributed to their runaway success and six-digit enrollment.
One difference was that they did not post videos of the in-class lectures. There was, in fact, no in-class lecture. Instead, they taped short videos, rehearsed and edited, with the content of a standard 90-minute class broken down in 4 ten-minutes video or so. This is about the difference between taping a play and making a movie. Then the videos came with some forms of “interactivity” (quizzes that had to be answered to continue), and they were released at the rate in which the class progressed, so that there was a community of students watching the videos at the same time and able to answer each other’s questions in forums. Finally, the videos were used in the Stanford offerings of the classes: the students were instructed to watch the videos by themselves, and during the lecture time they would solve problems, or have discussions or have guest lectures and so on. (In K-12 education, this is called the “flipped classroom” model, in which students take lectures at home and solve homeworks in class, instead of the traditional other way around.)
In the past few months, there has been a lot of thinking, and a lot of acting, about the success of this experiment. Sebastian Thrun started a company called udacity to offer online courses “branded” by the company itself, and Daphne Koller and Andrew Ng started a company called coursera to provide a platform for universities to put their courses online, and, meanwhile, Harvard and Berkeley joined MIT to create edX.
At a time when the growth of higher education costs in the United States appear unsustainable, particularly in second-tier universities, and when the demand for high-quality higher education is exploding in the developing world, these projects have attracted a lot of interest.
While the discussion has been focused on the “summer blockbusters” of higher education, and what they should be like, who is going to produce them, how to make money from them, and so on, I would like to start a discussion on the “art house” side of things.
In universities all over the world, tens of thousands of my colleagues, after they have “served” their departments teaching a large undergraduate classes and maybe a required graduate class, get to have fun teaching a research-oriented graduate class. Their hard-earned insights into problems about which they are the world’s leading expert, be it a particular organ of the fruit fly or a certain corner of the Langlands program, are distilled into a series of lectures featuring content that cannot be found anywhere else. All for the benefit of half a dozen or a dozen students.
If these research-oriented, hyper-specialized courses were available online, those courses might have an audience of 20 or 30 students, instead of 100,000+, but their aggregate effect on their research communities would be, I believe, very significant.
One could also imagine such courses being co-taught by people at different universities. For example, imagine James Lee and Assaf Naor co-teaching a course on metric embeddings and approximation algorithms: they would devise a lesson plan together, each would produce half of the videos, and then at both NYU and UW the students would watch the videos and meet in class for discussions and working on problems; meanwhile study groups would probably pop up in many theory groups, of students watching the videos and working on the problem sets together.
So someone should put a research-oriented graduate course online, and see what happens. This is all to say that I plan to teach my class on graph partitioning, expander graphs, and random walks online in Winter 2013. Wish me luck!
I would like to thank all those that contributed to the Turing Centennial series: all those who wrote posts, for sure; but also all those who spread the word about this, on blogs, twitter, facebook, and in real life; those who came to read them; and those who wrote lots of thoughtful comments. In a community where discussions over conference organizational issues or over the importance of matrix multiplication algorithms can become very acrimonious, I am impressed that we could have such a pleasant and troll-free conversation. This goes to show that in theory has not only the smartest and most handsome readers of the whole internet, as was well known, but also the nicest ones!
We will definitely do this again in 2054, to mark the centennial of Turing’s death.
A few days ago, I was very saddened to hear of the death of Sally Ride. A Stanford Alumna, Sally Ride became to first American woman to travel in space, she served on both the investigative committees after the two Shuttle disasters, and dedicated the past decade to the goal of getting young kids, and girls in particular, interested in science and technology. She cofounded, and directed, a non-profit foundation to further these goals, and wrote several books. After her death, it was revealed that she had been in a 25-year relationship with another woman, who was also the coauthor of her books and a partner in her foundation.
I think it is significant that a person that certainly had a lot of courage, determination, willingness to defy stereotypes, and to be an inspiration for people like her, felt that she could not be publicly out during her life. (In interviews about their books, Ride and her partner Tam O’Shaughnessy referred to each other as “friends”.)
Let’s hope that in 2054 it’s not just computer science professors in the West that are confortable being out, but also astronauts, movie stars, professional athletes, and so on.
[Leaving the best for last, here is Ashwin Nayak's post. Unlike the other posts in this series, Ashwin does not just talk about events, but he also gives us a view of his inner life at several critical times. What can I say to introduce such a beautiful essay? I got this: congratulations Ashwin! -- L.T.]
(Some names have been changed to protect privacy. Some events have been presented out of chronological order, to maintain continuity in the narrative. The unnamed friends in Waterloo are Kimia, Andrew, Anna-Marie, and Carl. I would like to thank them, Joe, Luca, and especially Harry for their feedback on a draft of this blog post. Harry offered meticulous comments, setting aside a myriad commitments. Most of all, I would like to thank my sisters and my parents for graciously agreeing to being included in this story.
For those not in theoretical computer science, FOCS is one of the flagship conferences on this subject. Luca is a professor of computer science at Stanford University, and Irit at Weizmann Institute of Science.
A prelude: I was born into a middle-class family from the South-West coast of India. I am the youngest of three siblings, and grew up in cities all over the country. My father served as an officer in the Indian army, and my mother taught in middle school until she switched to maintaining the household full-time. I went to IIT Kanpur for my undergraduate studies when I was 17. At 21, I moved half-way across the world to Berkeley, CA, for graduate studies. In 2002, after a few years of post-doctoral work in the US, I moved to Waterloo, ON, to take up a university faculty position.)
We were walking through art galleries in San Francisco when Luca brought up the Turing centenary events that were taking place around the world. None of the events celebrating his work referred to Turing’s homosexuality. Luca wondered whether the celebrations would be complete without revisiting this aspect of his life. As a response, he was thinking of having a series of guest blog posts by contemporary gay and lesbian computer scientists about their experiences as gay professionals. How would they compare with those in Turing’s times?
I wonder how much of my attention was on the art in the next few galleries. Would I write a post? What would I write? For me, sexuality is so deeply personal a matter that I’ve talked about it only with a handful of people. Why would I write about it publicly? Something Luca had said stuck in my mind: “The post could even be anonymous. That would be a statement in itself.” It took me back to my first relationship: I dated Mark for over three years and no one other than his friends knew. Times when I was on the verge of telling a friend about my relationships flashed by. I remembered the time I discussed with my immediate family why I would not get married (at least not the way they imagined). Times when students recognized me at events for gays and lesbians resurfaced, as did conversations with friends and colleagues grappling with openness. I would write a post, I told Luca.
That night, I got little sleep. Memories that I thought had slipped into oblivion loomed large. Read the rest of this entry »
[Rosario Gennaro is a cryptographer, and he has been at IBM for more than 15 years. (He must have started as a teen-ager.) On Monday, he will start his new job as a professor at the City College of New York and the Director of the Center for Algorithms and Interactive Scientific Software. In the middle of his move and of an internet-free vacation, Rosario found the time to write a guest post that goes in a quite different direction from the others. -- L.T.]
“David Hilbert … I suppose his name doesn’t mean much, if anything, to you? No, no? Well, there you are, you see? It’s the way of the world! People, never seem to hear about the really great mathematicians!”
The recent celebrations for Alan Turing’s centenary made me revisit the BBC movie of “Breaking the Code” that amazing Broadway play, with a wonderful Derek Jacobi playing Alan Turing. You can see the most astonishing bit of this play here:
a 6-minute tour de force monologue explaining in lay terms Godel's Theorem and Turing's discovery of undecidable problems.
The quote above is from the beginning of this monologue, and it made me reconsider the goal of this guest post that I had promised Luca for his blog. Yes, I could talk about my coming out and about how supportive the Theory community has been. Or I could support, by personal experience, Luca's comments on how graduate students who are gay and not out, have an additional burden to carry. Imagine your doubts on being good enough to do research, as you embark in a Ph.D. program (well, I don't know if *you* had those doubts, but I surely had them!) and add to them the sense on not being "good enough" in general because you are gay.
But that's not what I decided to talk about. There is no question that Alan Turing's sexual orientation has played a huge role in the popularization of his figure and his work. "Breaking the Code" would not have been written if not for the unique personal story that accompanied Turing's exceptional contribution to Mathematics and Computer Science. Nor would NPR have run a story last month on the centenary. Neither Godel nor Hilbert (both mentioned in the above monologue) got such treatment.
While I wish that being gay were a sufficient condition for being a celebrated mathematician in the news (reserve space for my profile in the next issue of the New Yorker please), I wonder if being queer in some form is necessary. What can we do, as a community to make sure people know, not only Turing, but also Hilbert, and Godel, and Gauss. How can we make the Mathematics relevant, rather than the person. Can we get liberal arts majors, for example, to have a deep appreciations of the *ideas* and the *concepts* of Mathematics and Computer Science, even if they will never understand the proofs and the techniques? As I embark on an academic career after 16 years of research in a corporate lab, these questions have been occupying my mind. Others are wondering too …
Theoretical Computer Science, in my opinion, presents many opportunities on this front. Decidability, computational hardness, (pseudo)randomness … those are all concepts around which a philosophy class could be built. After all, as the fictional Turing says in the play, it's about telling right from wrong. I would love to develop such a class for liberal arts majors, and maybe the readers of "in theory" can help me by pointing me to similar classes that are already being taught somewhere. Yes, I am that lazy.
To finish off, being an opera queen (as any self-respecting homosexual should be) I have a not-so-secret wish to see "Breaking the Code" adapted into an opera. I think John Adams, whose work on physicist J. Robert Oppenheimer and the atomic bomb was mesmerizing, would be my top choice for a composer:

Recent Comments