Another (sigh) delayed post. A moderately busy month, with 4 papers with sublinear graph algorithms and, of course, distribution testing.

A 0.51-Approximation of Maximum Matching in Sublinear (n^{1.5}) Time by Sepideh Mahabadi, Mohammad Roghani, and Jakub Tarnawski (arXiv). The title says it all. This paper gives a 0.51-approximation for the maximum matching in, well, (O(n^{1.5})) time. Here’s the back story. There are two kinds of sublinear maximum matching algorithms. The first kind (achieved after a long line of impressive results) get ((1- \varepsilon))-approximations in time (O(n^{2 – \Omega_{\varepsilon}(1)})). This means that that in sublinear (o(n^2)) time, one can get arbitrarily good approximations. The second kind beats the simple (0.5)-approximation (obtained from maximal matchings), but tries for closer to (O(n)) time. BehnezhadRoghaniRubinsteinSaberi gave a ((0.5+\varepsilon))-approximation algorithm running in (n^{1+\varepsilon}) time. But the largest value of (\varepsilon) possible in these results is tiny. Can one get a “significant” improvement over 0.5-approximations in time “significantly less” than (n^2)? The answer is yes, as the title explains.

Testing (Conditional) Mutual Information by Jan Seyfried, Sayantan Sen, and Marco Tomamichel (arXiv, ECCC). Consider a distribution over pairs ((A, B)). The mutual information (I(A,B)) of (A) and (B) is the minimum KL-divergence between the original distribution and any product distribution over the support of (A) and (B). So the mutual information is zero iff ((A,B)) is a product distribution, or equivalently (A) and (B) are independent. A natural distribution testing question is to distinguish (A) and (B) being independent from (I(A,B) > \varepsilon). One can take this problem a step further. Suppose the distribution is over triples ((A,B,C)). Then can one distinguish conditional independence ((I(A, B C) = 0)) from sufficient conditional dependence, so (I(A, B C) > \varepsilon)? This problem was studied by (including our very own) Canonne-Diakonikolas-Kane-Stewart, and is known that (O(d^{7/4})) samples suffice. Here, (d) denotes the maximum alphabet size for each coordinate, so the overall domain size is (d^3). This paper gives a much more nuanced complexity, that depends separately on the support sizes for each coordinate. It is interesting that each coordinate plays a different role in the final complexity.

Tight simulation of a distribution using conditional samples by Tomer Adar (arXiv). Consider a distribution of ({0,1}^n). Many distribution testing results would not yield useful algorithms for this setting, since the domain is exponentially large. Inspired by this setting, the subcube conditional model was introduced by Bhattacharyya-Chakraborty. In this setting, we can generate samples conditioned on any subcube. The aim of this paper goes beyond property testing. Can we actually estimate specific values of the distribution with a few queries? Can we simulate the distribution, which means to “locally compute” a distribution that is close to the input? Specifically, given domain point (x), we want to compute a value (\mu(x)) such that (\mu) represents a distribution close to the input. This paper gives such a result, where each (\mu(x)) can be computed in (O(n^2/\varepsilon^2)) queries. The final distribution has KL-divergence (\varepsilon^2) from the original.

Sampling and Identity-Testing Without Approximate Tensorization of Entropy by William Gay, William He, Nicholas Kocurek, and Ryan O’Donnell (arXiv). Again, consider a distribution (D) over a large product domain (\Sigma^n). Our aim is identity testing, so we wish to determine if ({D} = {D}’) or ( {D} – {D}’ > \varepsilon), where ({D}’) is some explicitly known distribution. The model is much weaker than subcube condition access. Essentially, we can only get samples from subcubes of dimension (at least) (n-1). Blanca-Chen-Štefankovič-Vigoda studied identity testing where ({D}) satisfies a condition called “approximate tensorization of entropy” (ATE). By Jensen’s inequality, if ({D}) is a product distribution, the entropy of ({D}) is at most the sum of entropies of the marginal distributions. If the entropy of ({D}) is (say) at most a constant factor of this sum, we say it satisfies the ATE. Blanca-Chen-Štefankovič-Vigoda proves that (\widetilde{O}(n/\varepsilon)) suffices for identity testing. This paper studies input distributions that are mixtures of (say, a constant number of) distributions satisfying ATE. Interestingly, these mixtures do not satisfy the ATE. But this paper shows that one can still get (\widetilde{O}(n/\varepsilon)) query identity testers.

By Seshadhri

Read original post