Back to theory! Today I am writing in a vain attempt to understand this paper by my soon-to-be advisor Anup Rao. I chose it because it seemed interesting and only 6 pages! Little did I know that I was going to be slapped with some of the densest 6 pages I've ever encountered. The goal for this post will be to sprinkle in as much of my intuition as to what is going on as possible, and honestly to try to understand it better myself as I go through with the explanation. There's also a nice talk by Anup, where you can get a lot of intuition but he doesn't have a lot of time to get into the actual details of the encoding. Since that exists but has that shortfall, I'll try and give a bit more details, while still inputting as much intuition as possible.
Finally, I'll mention that the incoming explanation was partly inspired by a recent video from Ryan O'Donnell explaining the famous Switching Lemma from complexity theory. I won't get into what the switching lemma is, but basically a key component of the proof is a clever encoding argument, which O'Donnell describes as leaving clues for a detective. I thought this was both enlightening and cute, so I will attempt something similar, since this proof also involves encodings!
The sunflower problem itself is super simple and natural to state, so the background is hopefully quite accessible. Consider a family $\mathcal{F}$ of $\ell$ sets, each of size $k$, where each set is taken from the universe $[n] := \{1, ..., n\}$. Now let's define a sunflower:
Now I will remark that beyond just being a beautifully simple question to state, the answer to this question has many applications within theoretical computer science. But we won't get into that here, and instead rely on just the beauty of it to motivate us throughout.
Some History: Erdös and Rado showed that if $\ell \geq (p-1)^k\cdot k!$ then there must exist a $p$-sunflower, and moreover, there exists a family with $(p-1)^k$ sets of size $k$ that does not contain a sunflower. They conjectured that this was actually the extremal example, but it wasn't until just a couple of years ago that any progress was made toward closing this gap. Alweiss, Lovett, Wu, and Zhang showed that any family with $\ell \geq (\log k)^k (p \log\log k)^{O(k)}$ sets must have a $p$-sunflower. The paper I am examining today slightly improved and simplified the argument to show that any family with $\ell \geq (\alpha p \log(pk))^k$ sets must have a $p$-sunflower.
So first, the result of Erdös and Rado is simple enough for us to show very quickly, and more importantly offers some intuition as to how subsequent works were able to improve it. So let's take a look! Recall that Erdös and Rado claimed that there must be a $p$-sunflower if $\ell \geq (p-1)^k \cdot k!$.
Ok, awesome! Where was the slack in this last proof? Well, it could be that there were more than one element shared by a large portion of the sets in $\mathcal{F}$, which allow for some strong induction argument rather than just regular old induction. The key is to define a concept of how well-spread the family of sets are. If they are well spread, then we should be able to say that there are at least $p$ disjoint sets, and if they are not well-spread, well then we should be able to apply our induction. Let's make this a bit more formal.
What we would like to do, and ultimately will do, is the following. Consider randomly partitioning the universe $n$ into $p$ buckets with $n/p$ elements each. What we like to show is that the probability that a single bucket completely contains one of the sets from our family is greater than $1-1/p$. Then, by a union bound and Markov's inequality, we could show that with positive probability, every bucket compeltely contains one of the sets of our family, immediately showing $p$ disjoint sets, and proving the lemma above.
Now, that's a bit difficult, so consider just trying to bound $\Pr[|W \setminus S_i|< t]$ for some $S_i \in \mathcal{F}$ and a random $W$. We can do this by an encoding argument! Specifically, we will encode all of the pairs $(W, S_i)$ where $|W \setminus S_i| \geq t$; i.e. $W$ is not doing a very good job of containing $S_i$. Now, we are going to leave clues for a detective to recover our pair $(W, S_i)$, albeit in a somewhat odd way. This odd way is an attempt to take advantage of our assumption that our family is $r$-well-spread. Here are the clues we leave:
In some sense, as Rao points out in his talk, this can be thought of as an anti-concentration bound, since we have strong concentration bounds that say that most of the the time $W$ will contain half of any $S_i$, but the above shows, using the well-spread assumption, that it's also extremely likely to contain almost entirely of one of the $S_i$'s. Interesting!
We start with a definition, which hopefully makes sense in light of the previous example. We saw that for a given $S \in \mathcal{F}$, it may not mostly contained in $W$, but somehow there is another $S_j$, which is related to $S$ (being contained in $W \cup S$), which is mostly contained in $W$.
Ok, let me state the more technical version of the above lemma, now that we know why we need it.
Yuck, already too many variables. The point is, we are going to prove this by induction. So what we are going to do is assume that it works for $m-1$, and try to prove it for $m$ (ok, I just said what induction is). How are we going to do that?? Well, we are going to split the $W$ we have, which is of size $\kappa mn/r$ into two disjoint sets, $U$ of size $ u =\kappa (m-1)n/r$ and $V$ of size $v = \kappa n/r$. Then, "all" we have to do is show that for any fixed $U$, we have $$\mathbb{E}_{V,X}[|\chi(X, W)|] \leq2/3 \mathbb{E}_X[|\chi(X, U)|].$$ Ha, easy enough. Of course, without the $2/3$, the above is trivially true.
Let's get a trivial case out of the way. Suppose that $\chi(x, U)$ is empty for some $x$. Recall that $\chi(x, U) = \min_y S_y/U$, where $S_y \subseteq S_x \cup U$. This implies that there exists some $y$ such that $S_y \subseteq U$, which immediately implies that $S_y \subseteq W$, which implies that $$\mathbb{E}[|\chi(X, W)|] = \mathbb{E}[|\chi(X, U)|] = 0,$$ so the bound we are trying to prove would be trivial. So let's suppose that this is not the case, which means that for all $x$, $|\chi(X, U)| \geq 1$.
I promised a detective, now I will deliver. What we are going to do is encode $(V, X)$, meaning we are going to leave clues such that any (mathematically well-versed) detective can recover the pair $(V,X)$. We assume the detective already knows $U$. First of all, how many possible pairs are there? Well, at least $r^k \binom{n-u}{v}$. In fact, exactly that many, and moreover, $X$ and $V$ are uniformly distributed over all the possibilities. By Shannon's arguments, and via Huffman encoding, this means we can encode $(X, V)$ trivially with $log \left ( r^k \binom{n-u}{v} \right )$ bits. But this by itself is not useful to use, and does nothing to relate $\mathbb{E}[|\chi(X, W)|]$ with $\mathbb{E}[|\chi(X, U)|]$, Remarkably, what Rao was able to show is that we can encode $(X, V)$ for our detective using $$log \left ( r^k \binom{n-u}{v} \right ) + a\cdot \mathbb{E}[|\chi(X, U)|] - b \cdot \mathbb{E}[|\chi(X, W)|],$$ which by Shannon's theorem is at least $log \left ( r^k \binom{n-u}{v} \right )$. Crucially, we have that $a/b \leq 2/3$, and this immediately implies the result!
We proceed to encode $(X,V)$ in two different ways, depending on the situation.
Rough case 1: Given some $A \subseteq \chi(X,U)$, there are never too many $y$'s such that $A \subseteq \chi(y,U) \subseteq \chi(X,U) \cup V$. We can utilize this to cleverly encode $X$ with fewer bits than we otherwise might expect, after which the detective knows $\chi(X, U)$, and we just need a bit more information to determine $V$.
Rough case 2: case 1 is not true, so for some $A$ there are lots of possible $y$'s such that $A \subseteq \chi(y,U) \subseteq \chi(X,U) \cup V$. Rao shows that this implies that the number of possible $V$ has now gone down significantly, so we can in this case more cleverly encode $V$ after trivially encoding $X$.
Oh you still want more details? Alright, fine. The only slight lie above was that we only consider $\chi(y,U)$ of size $|\chi(X, U)|$ and $A$ of size $|\chi(X,W)|$. This makes sense because we ultimately need to have our encoding be in terms of $|\chi(X, U)|$ and $|\chi(X, W)|$.
So, we start by defining $$\tau(A, X,V) = \{ y \in [\ell]: A \subseteq \chi(y, U) \subseteq V \cup \chi(X, U) \text{ and } |\chi(y, U)| = |\chi(X, U)|\},$$ and $$\phi(X, V) = r^k (\rho v/n)^{|\chi(X,U)|}\cdot (vr/n)^{-|\chi(X,W)|}.$$ yuck. The point is, that with the above, case 1 is just when $\tau(A, X,V) < \phi(X, V)$ for all $A \subseteq \chi(X, U)$ of size $|\chi(X,W)|$, and case two is, well, when that's not the case. Hopefully it's a bit easier to parse after seeing the rough idea above. Now we are ready to give our detective some clues!
As before, we let the detective know we are in case two by making the first bit 1. Now, we are in case 2, which roughly means that there is some $A$ of size $|\chi(X,W)|$ such that $V$ contains a LOT of other $\chi(y,U)$'s (each of which also contains $A$). This is visualized in the below picture. At a high level, this will limit how many possibilities for $V$ we have.
The coolest part by far to me was this notion of using Shannon's information theoretic bounds in this really unique way. Rather than encode the quantity of interest in a normal/canonical way, we do it in a rather weird way in order to get a relationship on the quantities we are interested in. I wonder if this sort of idea works in other settings!
The last thing I will mention is that a super simple modification to the above proof idea was able to give an optimal upper bound of $r = \alpha p \log(k)$ for some constant $\alpha$. Namely, the idea is simple to partition into $2p$ buckets instead of $p$ buckets, and show that with probability $>0$ at least half completely contain a set in the family. Surprisingly, this gets rid of the $\log(p)$ factor! See here for the details, although that's pretty much the entire idea.