You are currently browsing the category archive for the ‘math’ category.

A few weeks ago, the Proceedings of the National Academy of Science published an article on a study conducted by a group of Cornell researchers at Facebook. They picked about 600,000 users and then, for a week, a subset of them saw fewer “negative” posts (up to 90% were filtered) than they would otherwise see, a subset saw fewer “positive” posts (same), and a control group got a random subset.

After the week, the users in the “negative” group posted fewer, and more negative, posts, and those in the “positive” group posted more, and more positive, posts.

Posts were classified according to an algorithm called LIWC2007.

The study run contrary to a conventional wisdom that people find it depressing to see on Facebook good things happening to their friends.

The paper has caused considerable controversy for being a study with human subjects conducted without explicit consent. Every university, including of course Cornell, requires experiments involving people to be approved by a special committee, and participants must sign informed consent forms. Facebook maintains that the study is consistent with its terms of service. The highly respected privacy organization EPIC has filed a complaint with the FTC. (And they have been concerned with Facebook’s term of service for a long time.)

Here I would like to explore a different angle: almost everybody thinks that *observational* studies about human behavior can be done without informed consent. This means that if the Cornell scientists had run an analysis on old Facebook data, with no manipulation of the feed generation algorithm, there would not have been such a concern.

At the same time, the number of posts that are fit for the feed of a typical user vastly exceed what can fit in one screen, and so there are algorithms that pick a rather small subset of posts that are evaluated to be of higher relevance, according to some scoring function. Now suppose that, if N posts fit on the screen, the algorithm picks the 2N highest scoring posts, and then randomly picks half of them. This seems rather reasonable because the scoring function is going to be an approximation of relevance anyway.

The United States has roughly 130 million Facebook subscriber. Suppose that the typical user looks, in a week, at 200 posts, which seems reasonable (in our case, those would be a random subset of roughly 400 posts). According to the PNAS study, roughly 50% of the posts are positive and 25% are negative, so of the initial 400, roughly 200 are positive and 100 are negative. Let’s look at the 100,000 users for which the random sampling picked the fewest positive posts: we would be expecting roughly 3 standard deviations below the mean, so about 80 positive posts instead of the expected 100; the 100,000 users with the fewest negative posts would get about 35 instead of the expected 50.

This is much less variance than in the PNAS study, where they would have got, respectively, only 10 positive and only 5 negative, but it may have been enough to pick up a signal.

Apart from the calculations, which I probably got wrong anyway, what we have is that in the PNAS study they picked a subset of people and then they varied the distribution of posts, while in the second case you pick random posts for everybody and then you select the users with the most variance.

If you could arrange distributions so that *the distributions of posts seen by each users are the same*, would it really be correct to view one study as experimental and one as observational? If the PNAS study had filtered 20% instead of 90% of the positive/negative posts, would it have been ethical? Does it matter what is the *intention* when designing the randomized algorithm that selects posts? If Facebook were to introduce randomness in the scoring algorithm with the goal of later running observational studies would it be ethical? Would they need to let people opt out? I genuinely don’t know the answer to these questions, but I haven’t seen them discussed elsewhere.

I have always known that I don’t completely understand how fiat money works, however I have recently realized that I don’t understand it *at all*! Maybe some of my readers can clarify my confusion.

So let’s start a national economy from scratch, without international trade: so a group of people move to a deserted island, declare independence, the people have all kind of skills, they bring all kind of materials and machineries with them, the island has all kinds of natural resources and maybe a lottery system gives some people ownership of various plots of lands and mining rights.

Now some would-be entrepreneurs would like to begin hiring people with the right skills, buying or renting various stuff, and start some businesses. It seems that there is no loss of generality if I think of the entrepreneurs as just one person for the sake of what I want to think about. She is going to need to get a loan to start her business(es). Meanwhile, the new island government created a central bank, which issues theory dollars, or thollars, which are the currency of the island. The central bank “creates” thollars and lends them to banks, then the banks keep a fractional reserve and lend to the entrepreneur. Again, for the sake of what I want to say, there is no loss of generality if I identify the central bank and the other banks as just one entity.

So the central bank lends thollars to the entrepreneur, and she uses the money to start the business and being to pay people. At this point money starts circulating in the economy, and people will hire gardeners and babysitters, and lawyers, they will give to charities, they will hire computer science theory tutors for their kids, they will buy and sell houses to each others, and, crucially, will buy whatever goods and services the entrepreneur is selling. Now she is making a profit and she can pay back the loan to the central bank and invest in more … wait, *she can never pay back the loan!*

That’s because all the thollars in circulation are the ones that the central bank lent her, so there is no way that, as the money circulates, she can ever make more money that she owes!

Ok, so maybe people will also take loans to buy houses and stuff, so that’s more money that circulates, but is it really the case that, overall, it is impossible for everybody to be debt-free? That all the cash that any debt-free person has needs to be compensated by an equivalent amount of debt from other people? This is not how things seem to be in practice.

Now, clearly it is possible for everybody to have positive net worth, because, at the start, people have stuff and that stuff is worth money, but it seems strange that not everybody can be debt-free.

Maybe the problem is deflation? That if the entrepreneur borrowed money to create a business that creates new wealth (because it makes stuff that people find more valuable than the value of the raw material and the value of the work that went into it), but the amount of circulating money stays the same, then there is deflation, and her debt is spiraling out of control in deflation-adjusted terms, even if the interest rate is zero?

It seems that there are only two ways in which you can have everybody be debt-free and have a positive amount of cash: (i) the central bank starts buying stocks of private companies, (ii) the government runs a deficit, and the central bank buys government debt.

Is this correct? Normally, we think of people and companies being debt-free (or having more cash than debt) as ideal, and a government running no deficit as ideal, and usually central banks don’t buy stocks (they only buy bonds, which is formally equivalent to lending), so are these three conditions contradictory?

The Simons Institute for the Theory of Computing at Berkeley has started operations a couple of months ago, it has been off to a great start. This semester there is a program on applications of real analysis to computer science, in which I am involved, and one on “big data,” whose workshops have been having such high attendance that they had to be organized offsite.

The institute itself is housed in a beautiful circular three-story building, half of whose second floor is an open space with sofas, whiteboards, tall ceilings, big windows, and exposed pipes on the ceiling, for an added loft-like look. If it had a ball pit, a foosball table, and free sushi it would look like the offices of a startup.

Next year, there will be programs on spectral graph theory, on applications of algebraic geometry, and on information theory.

Junior people (senior graduate students, postdocs, and junior assistant professors who have received their PhD no longer than six years ago) can participate in the program of the institute as “fellows.” Information on the fellowships is at simons.berkeley.edu/fellows2014; the deadline to apply is December 15.

This year is the centennial of Paul Erdős’s birth. Erdős lived most of his adult life as a traveling mathematician, “couchsurfing,” as we would say now, from place to place and from mathematical conference to mathematical conference. He wrote more than 1,500 papers with more than 500 different coauthors, introduced the probabilistic method and was the defining figure of the “Hungarian approach” to combinatorics. He died at age 83 while attending a mathematical conference.

Last year, we celebrated the centennial of Alan Turing’s birth. Turing and Erdős have become such iconic figures both for the impact of their work and for the fascinating facts of their lives. I would like to argue that the cultural archetype through which we interpret their lives is that of the *saint*. It is clearly that of the *martyr saint* in the case of Turing, while Erdős gave up material possessions and devoted his life to others, traveling everywhere and “preaching” to everybody, much in the mold of Saint Francis.

(A comparison of the Turing centennial celebration and Erdős’s, and a look at the frescoes of Medieval Catholic churches will show which kind of saint people are more interested in.)

The first step to become a saint of the Catholic church is to establish that the person exhibited “heroic virtues,” which is a great expression. This is an archetype that is not restricted to religion: you see it occurring in communist propaganda (Stakhanov, Lei Feng) and in every civil rights movement.

Saints were the “celebrities” of the Middle Ages, those whose life people liked to talk about. But contemporary celebrities come from a totally different archetype, that of the Greek God. Greek (and Roman) gods were petty and jealous, they cheated on their spouses, they were terrible parents, but there were good stories to be told about them. We don’t want (at least, I don’t) to live the life of a saint, but thinking about them is certainly inspirational and it makes us think that if someone can be so much better than us, maybe we can be a little better ourself in the practice of “virtues”, whatever this may mean to us. And we don’t admire gods, but, well, it’s probably fun to be one.

As usual, I have lost track of what I was trying to say, but I think that it speaks well of the academic community that we are more interested in saints than in gods, I will close by saying that my favorite saint of complexity theory is Avi Wigderson, I will keep to myself who my favorite god of complexity theory is, and I will leave it to the readers to contribute their picks.

*[I have been asked by the office of public affairs of the Institute for Advanced Study to publicize the following press release. L.T.]*

April 1, 2013. For immediate release.

Cofounders Jean Bourgain and Peter Sarnak announce today the launch of *eXpandr*, a new venture that aims to become the world’s leading provider of expander graphs.

“We are excited about our mission to change the way the world uses expanders.” said CEO Guli Mars, who joined eXpandr after a distinguished career in several leading technology companies. “Expanders are vital to revenue-generating logarithms, and our technology will revolutionize a multi-billion dollar market.”

“Big data, disruption”, said Juan Raman, senior vice president for marketing. “Innovation, cloud computing”, Mr. Raman continued.

“Let p be a prime congruent to 1 modulo 4″ said Jean Bourgain, cofounder and senior vice-president for analytic number theory, “and consider the irreducible representations of PSL(2,p).”

*About the Institute for Advanced Study*. The Institute for Advanced Study is one of the world’s leading centers for theoretical research and intellectual inquiry. The Institute exists to encourage and support fundamental research in the sciences and humanities—the original, often speculative thinking that produces advances in knowledge that change the way we understand the world. Work at the Institute takes place in four Schools: Historical Studies, Mathematics, Natural Sciences and Social Science. It provides for the mentoring of scholars by a permanent Faculty of no more than 28, and it offers all who work there the freedom to undertake research that will make significant contributions in any of the broad range of fields in the sciences and humanities studied at the Institute.

The Institute, founded in 1930, is a private, independent academic institution located in Princeton, New Jersey. Its more than 6,000 former Members hold positions of intellectual and scientific leadership throughout the academic world. Some 33 Nobel Laureates and 38 out of 52 Fields Medalists, as well as many winners of the Wolf or MacArthur prizes, have been affiliated with the Institute.

*About eXpandr*. eXpandr aims to disrupt the way the world uses expander graphs, and to become the leading commercial provider of expanders. eXpandr received $2 million in angel investing and will launch its first product by Summer 2013.

Having (non-rigorously) defined the Laplacian operator in manifolds in the previous post, we turn to the proof of the Cheeger inequality in manifolds, which we restate below.

Theorem 1 (Cheeger’s inequality)Let be an -dimensional smooth, compact, Riemann manifold without boundary with metric , let be the Laplace-Beltrami operator on , let be the eigenvalues of , and define theCheeger constantof to bewhere the is the boundary of , is the -dimensional measure, and is -th dimensional measure defined using . Then

We begin by recalling the proof of the analogous result in graphs, and then we will repeat the same steps in the context of manifolds.

Theorem 2 (Cheeger’s inequality in graphs)Let be a -regular graph, be its adjacency matrix, be its normalized Laplacian matrix, be the eigenvalues of , and define for every subset of vertices . Define the conductance of aswhere is the number of edges with one endpoint in and one endpoint in . Then

**1. Proof the Cheeger inequality in graphs **

We will use the *variational characterization* of the eigenvalues of the Laplacian of a graph .

and if is a minimizer in the above expression then

Following the definition of we see that

and so the minimum in (3) is 0, and it is achieved for . This means that

The expression in the right-hand-side of (4) is an important one, and it is called the Rayleigh quotient of , which we will denote by :

It is also useful to consider the variant of the Rayleigh quotient where there are no squares; this does not have a standard name, so let us call it the Rayleigh quotient and denote it by :

The proof of the graph Cheeger inequality now continues with the proof of the following three facts.

Lemma 3 (Rounding of embeddings)For every non-negative vector there is a value such that

Lemma 4 (Embedding of into )For every non-negative vector , we have

Lemma 5 (From an eigenvector to a non-negative vector)For every such that there is a non-negative such that and such that

Now let us start from a function that optimizes (4), so that and , then apply Lemma 5 to find a function such that the volume of the vertices having positive coordinates in is at most and such that . Then consider the vector such that ; by Lemma 4, we have , and by Lemma 3 there is a threshold such that the set has conductance . Since is a subset of the vertices having positive coordinates in , we have , and so

which is the Cheeger inequality for graphs. It remains to prove the three lemmas.

*Proof:* of Lemma 3. For each threshold , define the set

The idea of the proof is that if we pick at random then the probability that an edge belongs to is proportional to and the probability that is proportional to , so that the expected number of edges in is proportional to the numerator of and the expected number of vertices in is proportional to the denominator of ; if is small, it is not possible for to always be large for every .

To avoid having to normalize the range of to be between and , instead of taking averages over a random choice of , we will consider the integral over all values of . We have

because we can write , where if and otherwise, and we see that only the values of between and make , so .

We also have

and if we denote by the threshold such that is smallest among all the , then

so that

*Proof:* of Lemma 3. Let us consider the numerator of ; it is:

(we used Cauchy-Swarz)

(we used the definition of and Cauchy-Swarz again)

And so

*Proof:* of Lemma 5. Let be the median of , and consider defined as . We have

because the numerators of and are the same (the additive term cancels). The denominators are such that

because and the vector are orthogonal, and so by Pythagoras’s theorem the length-squared of equals the length-squared of plus the length-squared of .

Let us define and so that . We use the following fact:

Fact 6Let be disjointly supported non-negative vectors (“disjointly supported” means that they are non-zero on disjoint subsets of coordinates), then

*Proof:* The numerator of is

and, using orthogonality and Pythagoras’s theorem, the denominator of is

The fact now follows from the inequality

The lemma now follows by observing that and are non-negative and disjointly supported, so

and that both and have at most non-zero coordinate.

**2. Proof of the Cheeger inequality in manifolds **

We will now translate the proof of the graph Cheeger inequality to the setting of manifolds.

As you may remember, we started off by saying that is symmetric and so all its eigenvalues are real and they are given by the variational characterization. Now we are already in trouble because the operator on manifolds cannot be thought of as a matrix, so what does it mean for it to be symmetric? The consequence of symmetry that is exploited in the analysis of the spectrum of symmetric matrices is the fact that if is symmetric, then for every we have

and the property makes no references to coordinates, and it is well defined even for linear operators over infinite-dimensional spaces, provided that there is a notion of inner product. If we the define the inner product

on functions , and more generally

for functions , where is a vector space with inner product , then we can say that an operator is *self-adjoint* if

for all (appropriately restricted) functions . If is compact, this property is true for the Laplacian, and, in particular, and are adjoints of each others, that is,

(The discrete analog would be that is the transpose of .)

Self-adjointness (and appropriate conditions on ) imply a version of the spectral theorem and of the variational characterization. In particular, all eigenvalues of are real, and if there is a minimum one then it is

and if is a minimizer of the above, then

(The minimization is quantified over all functions that are square-integrable, and the minimum is achieved because if is compact then the space of such functions is also compact and the cost function that we are minimizing is continuous. In this post, whenever we talk about “all functions,” it should be understood that we are restricting to whatever space of functions makes sense in the context.)

From the property that and are adjoint, we have

so

where the Rayleigh quotient

is always non-negative, and it is zero for constant , so we see that and

By analogy with the graph case, we define the “ Rayleigh quotient”

And we can prove the analogs of the lemmas that we proved for graphs.

Lemma 7 (Rounding of embeddings)For every non-negative function there is a value such that

where the Cheeger constant of a subset of the manifold is

Lemma 8 (Embedding of into )For every non-negative function , we have

Lemma 9 (From an eigenfunction to a non-negative function)For every function such that there is a non-negative such that and such that

Let us see the proof of these lemmas.

*Proof:* of Lemma 7. For each threshold , define the set

Let be a threshold for which is minimized

We will integrate the numerator and denominator of over all . The *coarea formula* for nonnegative functions is

and we also have

which combine to

so that

*Proof:* of Lemma 8. Let us consider the numerator of ; it is:

We can apply the chain rule, and see that

which implies

and, after applying Caucy-Swarz,

And so

*Proof:* of Lemma 9. Let be a median of , and consider defined as . We have

because the numerators of and are the same (the derivatives of functions that differ by a constant are identical) and the denominators are such that

where we used the fact the integral of is zero.

Let us define and so that . We use the following fact:

Fact 10Let be disjointly supported non-negative functions (“disjointly supported” means that they are non-zero on disjoint subsets of inputs), then

*Proof:* We begin with the following observation: if is a non-negative function, and , then , because has to be a local minimum.

Consider the expression occurring in the numerator of . We have

But

because for every at least one of or is zero, and so at least one of or is zero.

Using this fact, we have that the numerator of is equal to the sum of the numerators of and :

and the denominator of is also the sum of the denominators of and :

because for every . The fact now follows from the inequality

The lemma now follows by observing that and are non-negative and disjointly supported, so

and that both and have a support of volume at most .

If anybody is still reading, it is worth observing a couple of differences between the discrete proof and the continuous proof.

The Rayleigh quotient is defined slightly differently in the continuous case. It would correspond to defining it as

in the discrete case.

If are disjointly supported and nonnegative, the sum of the numerators of the Rayleigh quotients and can be strictly smaller than the numerator of , while we always have equality in the continuous case. In the discrete case, the sum of the numerators of and can be up to twice the numerator of (this fact is useful, but it did not come up in this proof), while again we have exact equality in the continuous case.

The chain rule calculation

corresponds to the step

In the continuous case, and are “infinitesimally close”, so we can approximate by .

## Recent Comments