Publications  —  Pritish Kamath

Authors are in alphabetical order as is the tradition in Theory (exceptions marked by †).

Click on any title to see the corresponding abstract!

Theses

  • Some Hardness Escalation Results in Computational Complexity Theory [pdf]
    Pritish Kamath
    PhD. Thesis (MIT), 2019
    Abstract. In this thesis, we prove new hardness escalation results in computational complexity theory; a phenomenon where hardness results against seemingly weak models of computation for any problem can be lifted, in a black box manner, to much stronger models of computation by considering a simple gadget composed version of the original problem. For any unsatisfiable CNF formula $F$ that is hard to refute in the Resolution proof system, we show that a gadget-composed version of $F$ is hard to refute in any proof system whose lines are computed by efficient communication protocols. This allows us to prove new lower bounds for:

    • Monotone Circuit Size. We get an exponential lower bound for an explicit monotone function computable by linear sized monotone span programs and also in (non-monotone) $\mathsf{NC}^2$.
    • Real Monotone Circuit Size. Our proof technique extends to real communication protocols, which yields similar lower bounds against real monotone circuits.
    • Cutting Planes Length. We get exponential lower bound for an explicit CNF contradiction that is refutable with logarithmic Nullstellensatz degree.
    Finally, we describe an intimate connection between computational models and communication complexity analogs of the sub-classes of $\mathsf{TFNP}$, the class of all total search problems in $\mathsf{NP}$. We show that the communication analog of $\mathsf{PPA}_p$ captures span programs over $\mathbb{F}_p$ for any prime $p$. This complements previously known results that communication $\mathsf{FP}$ captures formulas (Karchmer--Wigderson, 1988) and that communication $\mathsf{PLS}$ captures circuits (Razborov, 1995).
  • Communication Complexity of Permutation-Invariant Functions [pdf]
    Pritish Kamath
    SM Thesis (MIT), 2015
    Abstract. Motivated by the quest for a broader understanding of communication complexity of simple functions, we introduce the class of ``permutation-invariant'' functions. A partial function $f:\{0,1\}^n \times \{0,1\}^n\to \{0,1,?\}$ is permutation-invariant if for every bijection $\pi:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in \{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}^{\pi}, \mathbf{y}^{\pi})$. Most of the commonly studied functions in communication complexity are permutation-invariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions.

    Our work highlights the role of the well-known lower bounds of functions such as Set-Disjointness and Indexing, while complementing them with the relatively lesser-known upper bounds for Gap-Inner-Product (from the sketching literature) and Sparse-Gap-Inner-Product (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutation-invariant functions, imperfectly shared randomness results in only a polynomial blow-up in communication complexity after an additive $O(\log \log n)$ overhead.

Publications and Pre-prints

  • Separating Computational and Statistical Differential Privacy (Under Plausible Assumptions) [pdf]
    Badih Ghazi, Rahul Ilango, Pritish Kamath, Ravi Kumar, Pasin Manurangsi,
    Manuscript, 2023
    Abstract. ...
  • On Differentially Private Counting on Trees [pdf]
    Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Kewen Wu
    Manuscript, 2023
    Abstract. We study the problem of performing counting queries at different levels in hierarchical structures while preserving individuals' privacy. We propose a new error measure for this setting by considering a combination of multiplicative and additive approximation to the query results. We examine known mechanisms in differential privacy (DP) and prove their optimality in the pure-DP setting. In the approximate-DP setting, we design new algorithms achieving significant improvement over known ones.
  • Regression with Label Differential Privacy [pdf]
    Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash Varadarajan, Chiyuan Zhang
    International Conference on Learning Representations (ICLR), 2023
    Abstract. We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.
  • Private Ad Modeling with DP-SGD [pdf]
    Carson Denison Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang
    AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI), 2023
    (Selected for Long Presentation)
    Abstract. A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their high class imbalance and sparse gradient updates. In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events, and evaluate their privacyutility trade-off on real-world datasets. Our work is the first to empirically demonstrate that DPSGD can provide both privacy and utility for ad modeling tasks.
  • Anonymized Histograms in Intermediate Privacy Models [pdf]
    Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
    Neural Information Processing Systems (NeurIPS), 2022
    Abstract. We study the problem of privately computing the anonymized histogram (a.k.a. unattributed histogram), which is defined as the histogram without item labels. Previous works have provided algorithms with $\ell_1$- and $\ell_2^2$-errors of $O_{\varepsilon}(\sqrt{n})$ in the central model of differential privacy (DP). In this work, we provide an algorithm with a nearly matching error guarantee of $\tilde{O}_{\varepsilon}(\sqrt{n})$ in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in privately estimating symmetric properties of distributions such as entropy, support coverage, and support size.
  • Private Isotonic Regression [pdf]
    Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
    Neural Information Processing Systems (NeurIPS), 2022
    Abstract. In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log |\mathcal{X}|/n$, where $\mathrm{width}(X)$ is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly $(\mathrm{width}(X) + \log |\mathcal{X}|)/n$, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for $\ell_1$ and $\ell_2^2$ losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.
  • Understanding the Eluder Dimension [pdf]
    Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro
    Neural Information Processing Systems (NeurIPS), 2022
    Abstract. We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation. First, we study the relationship between the eluder dimension for a function class and a generalized notion of rank, defined for any monotone "activation" $\sigma: \mathbb{R} \to \mathbb{R}$, which corresponds to the minimal dimension required to represent the class as a generalized linear model. It is known that when $\sigma$ has derivatives bounded away from $0$, $\sigma$-rank gives rise to an upper bound on eluder dimension for any function class; we show however that eluder dimension can be exponentially smaller than $\sigma$-rank. We also show that the condition on the derivative is necessary; namely, when $\sigma$ is the relu activation, the eluder dimension can be exponentially larger than $\sigma$-rank. For binary-valued function classes, we obtain a characterization of the eluder dimension in terms of star number and threshold dimension, quantities which are relevant in active learning and online learning respectively.
  • Do More Negative Samples Necessarily Hurt In Contrastive Learning? [pdf]
    Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath
    International Conference on Machine Learning (ICML), 2022
    (Selected for Long Presentation)
    Abstract. Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.
  • Faster Privacy Accounting via Evolving Discretization [pdf]
    Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
    International Conference on Machine Learning (ICML), 2022
    Abstract. We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms. Our algorithm achieves a running time and memory usage of $\mathrm{polylog}(k)$ for the task of self-composing a mechanism, from a broad class of mechanisms, $k$ times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent. By comparison, recent work by \citet{gopi2021numerical} has obtained a running time of $\widetilde{O}(\sqrt{k})$ for the same task. Our approach extends to the case of composing $k$ different mechanisms in the same class, improving upon their running time and memory usage from $\widetilde{O}(k^{1.5})$ to $\widetilde{O}(k)$.
  • Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions [pdf]
    Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
    Privacy Enhancing Technologies Symposium (PETS), 2022
    Abstract. The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP). Recent work [Meiser et al. 2018, Koskela et al. 2020, 2021a, 2021b] has shown that PLD-based accounting allows for tighter $(\varepsilon, \delta)$-DP guarantees for many popular mechanisms compared to other known methods. A key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.

    We present a novel approach to this problem. Our approach supports both pessimistic estimation, which overestimates the hockey-stick divergence (i.e., $\delta$) for any value of $\varepsilon$, and optimistic estimation, which underestimates the hockey-stick divergence. Moreover, we show that our pessimistic estimate is the best possible among all pessimistic estimates. Moreover, this discrete PLD, when used in compositions, would also yield pessimistic/optimistic estimates respectively of the hockey-stick divergence. Experimental evaluation shows that our approach can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than an existing method [Meiser et al. 2018].
  • Circuits Resilient to Short-Circuit Errors [pdf]
    Klim Efremenko, Bernhard Haeupler, Yael Tauman Kalai, Pritish Kamath, Gillat Kol, Nicolas Resch, Raghuvansh Saxena
    Symposium on Theory of Computing (STOC), 2022
    Abstract. Given a Boolean circuit $C$, we wish to convert it to a circuit $C$ that computes the same function as $C$ even if some of its gates suffer from adversarial short circuit errors, i.e., their output is replaced by the value of one of their inputs [KLM97]. Can we design such a resilient circuit $C$ whose size is roughly comparable to that of $C$? Prior work [KLR12, BEGY19] gave a positive answer for the special case where $C$ is a formula.

    We study the general case and show that any Boolean circuit $C$ of size $s$ can be converted to a new circuit $C'$ of quasi-polynomial size $s^{O(\log s)}$ that computes the same function as $C$ even if a $1/51$ fraction of the gates on any root-to-leaf path in $C'$ are short circuited. Moreover, if the original circuit $C$ is a formula, the resilient circuit $C'$ is of near-linear size $s^{1+\varepsilon}$. The construction of our resilient circuits utilizes the connection between circuits and DAG-like communication protocols [Raz95, Pud10, Sok17], originally introduced in the context of proof complexity.
  • On the Power of Differentiable Learning versus PAC and SQ Learning [pdf]
    Emmanuel Abbe, Eran Malach, Pritish Kamath, Colin Sandon, Nathan Srebro
    Neural Information Processing Systems (NeurIPS), 2021
    (Selected for Spotlight presentation)
    Abstract. We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $\rho$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b\rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $\rho$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b\rho^2$ is large enough, the power of SGD is equivalent to that of SQ learning.
  • Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels [pdf]
    Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro
    International Conference on Machine Learning (ICML), 2021
    Abstract. We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.
  • Does Invariant Risk Minimization Capture Invariance? [pdf]
    † Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, Nathan Srebro
    Artificial Intelligence and Statistics Conference (AISTATS), 2021
    (Selected for Oral presentation)
    Abstract. We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM. This can lead to worse generalization on new environments, even when compared to unconstrained ERM. The issue stems from a significant gap between the linear variant (as in their concrete method IRMv1) and the full non-linear IRM formulation. Additionally, even when capturing the "right" invariances, we show that it is possible for IRM to learn a sub-optimal predictor, due to the loss function not being invariant across environments. The issues arise even when measuring invariance on the population distributions, but are exacerbated by the fact that IRM is extremely fragile to sampling.
  • Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity [pdf]
    Pritish Kamath, Omar Montasser, Nathan Srebro
    Conference on Learning Theory (COLT), 2020
    Abstract. We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods.
  • Optimality of Correlated Sampling Strategies [pdf] [ToC version]
    Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan
    Theory of Computing (TOC), 2020
    Abstract. In the correlated sampling problem, two players are given probability distributions $P$ and $Q$, respectively, over the same finite set, with access to shared randomness. Without any communication, the two players are each required to output an element sampled according to their respective distributions, while trying to minimize the probability that their outputs disagree. A well known strategy due to Kleinberg-Tardos and Holenstein, with a close variant (for a similar problem) due to Broder, solves this task with disagreement probability at most $2\delta/(1+\delta)$, where $\delta$ is the total variation distance between $P$ and $Q$. This strategy has been used in several different contexts, including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography.

    In this paper, we give a surprisingly simple proof that this strategy is essentially optimal. Specifically, for every $\delta \in (0,1)$, we show that any correlated sampling strategy incurs a disagreement probability of essentially $2\delta/(1+\delta)$ on some inputs $P$ and $Q$ with total variation distance at most $\delta$. This partially answers a recent question of Rivest.

    Our proof is based on studying a new problem that we call constrained agreement. Here, the two players are given subsets $A \subseteq [n]$ and $B \subseteq [n]$, respectively, and their goal is to output an element $i \in A$ and $j \in B$, respectively, while minimizing the probability that $i \ne j$. We prove tight bounds for this question, which in turn imply tight bounds for correlated sampling. Though we settle basic questions about the two problems, our formulation leads to more fine-grained questions that remain open.
  • On the Complexity of Modulo-$q$ Arguments and the Chevalley-Warning Theorem [pdf]
    Mika Göös, Pritish Kamath, Katerina Sotiraki, Manolis Zampetakis
    Computational Complexity Conference (CCC), 2020
    Abstract. We study the search problem class $\mathsf{PPA}_q$ defined as a modulo-$q$ analog of the well-known polynomial parity argument class $\mathsf{PPA}$ introduced by Papadimitriou '94. Our first result shows that this class can be characterized in terms of $\mathsf{PPA}_p$ for prime $p$.

    Our main result is to establish that an explicit version of a search problem associated to the Chevalley--Warning theorem is complete for $\mathsf{PPA}_p$ for prime $p$. This problem is natural in that it does not explicitly involve circuits as part of the input. It is the first such complete problem for $\mathsf{PPA}_p$ when $p \ge 3$.

    Finally we discuss connections between Chevalley-Warning theorem and the well-studied short integer solution problem and survey the structural properties of $\mathsf{PPA}_q$.
  • Limits on the Efficiency of (Ring) LWE based Non-Interactive Key Exchange [pdf]
    Siyao Guo, Pritish Kamath, Alon Rosen, Katerina Sotiraki
    Public Key Cryptography (PKC), 2020
    Abstract.
    $\mathsf{LWE}$ based key-exchange protocols lie at the heart of post-quantum public-key cryptography. However, all existing protocols either lack the non-interactive nature of Diffie-Hellman key-exchange or polynomial $\mathsf{LWE}$-modulus, resulting in unwanted efficiency overhead. We study the possibility of designing non-interactive $\mathsf{LWE}$-based protocols with polynomial $\mathsf{LWE}$-modulus. To this end,

    • We identify and formalize simple non-interactive and polynomial $\mathsf{LWE}$-modulus variants of existing protocols, where Alice and Bob simultaneously exchange one or more (ring) $\mathsf{LWE}$ samples with polynomial $\mathsf{LWE}$-modulus and then run individual key reconciliation functions to obtain the shared key.
    • We point out central barriers and show that such non-interactive key-exchange protocols are impossible if:

      1. the reconciliation functions first compute the inner product of the received $\mathsf{LWE}$ sample with their private $\mathsf{LWE}$ secret. This impossibility is information theoretic.
      2. One of the reconciliation functions does not depend on the error of the transmitted $\mathsf{LWE}$ sample. This impossibility assumes hardness of $\mathsf{LWE}$.
    • We give further evidence that progress in either direction, of giving an $\mathsf{LWE}$-based non-interactive key exchange protocol or proving impossibility of one will lead to progress on some other well-studied questions in cryptography.
    Overall, our results show possibilities and challenges in designing simple (ring) $\mathsf{LWE}$-based non-interactive key exchange protocols.
  • Adventures in Monotone Complexity and $\mathsf{TFNP}$ [pdf]
    Mika Göös, Pritish Kamath, Robert Robere, Dmitry Sokolov
    Innovations in Theoretical Computer Science (ITCS), 2019
    (invited talk at FOCS 2018 workshop on TFNP)
    Abstract.
    Separations: We introduce a monotone variant of $\mathrm{Xor}\text{-}\mathrm{SAT}$ and show it has exponential monotone circuit complexity. Since $\mathrm{Xor}\text{-}\mathrm{SAT}$ is in $\mathsf{NC}^2$, this improves qualitatively on the monotone vs. non-monotone separation of Tardos (1988). We also show that monotone span programs over $\mathbb{R}$ can be exponentially more powerful than over finite fields. These results can be interpreted as separating subclasses of $\mathsf{TFNP}$ in communication complexity.

    Characterizations: We show that the communication (resp. query) analogue of $\mathsf{PPA}$ (subclass of $\mathsf{TFNP}$) captures span programs over $\mathbb{F}_2$ (resp. Nullstellensatz degree over $\mathbb{F}_2$). Previously, it was known that communication $\mathsf{FP}$ captures formulas (Karchmer-Wigderson, 1988) and that communication $\mathsf{PLS}$ captures circuits (Razborov, 1995).
  • Bayesian Inference of Temporal Task Specifications from Demonstrations [pdf]
    Ankit Shah, Pritish Kamath, Shen Li, Julie Shah
    Neural Information Processing Systems (NeurIPS), 2018
    Abstract. When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task. Prior research into learning from demonstrations (LfD) has failed to capture this notion of the acceptability of an execution; meanwhile, temporal logics provide a flexible language for expressing task specifications. Inspired by this, we present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula. We incorporate methods from probabilistic programming to define our priors, along with a domain-independent likelihood function to enable sampling-based inference. We demonstrate the efficacy of our model for inferring specifications with over 90% similarity between the inferred specification and the ground truth, both within a synthetic domain and a real-world table setting task.
  • Monotone Circuit Lower Bounds from Resolution [pdf]
    Ankit Garg, Mika Göös, Pritish Kamath, Dmitry Sokolov
    Symposium on Theory Of Computing (STOC), 2018
    Abstract. For any unsatisfiable CNF formula $F$ that is hard to refute in the Resolution proof system, we show that a gadget-composed version of $F$ is hard to refute in any proof system whose lines are computed by efficient communication protocols---or, equivalently, that a monotone function associated with $F$ has large monotone circuit complexity. Our result extends to monotone real circuits, which yields new lower bounds for the Cutting Planes proof system.
  • Dimension Reduction for Polynomials over Gaussian Space and Applications [shorter][longer]
    Badih Ghazi, Pritish Kamath, Prasad Raghavendra
    Computational Complexity Conference (CCC), 2018
    Abstract. We introduce a new technique for reducing the dimension of the ambient space of low-degree polynomials in the Gaussian space while preserving their relative correlation structure. As an application, we obtain an explicit upper bound on the dimension of an $\varepsilon$-optimal noise-stable Gaussian partition. In fact, we address the more general problem of upper bounding the number of samples needed to $\varepsilon$-approximate any joint distribution that can be non-interactively simulated from a correlated Gaussian source. Our results significantly improve (from Ackermann-like to "merely" exponential) the upper bounds recently proved on the above problems by De, Mossel & Neeman (CCC 2017, SODA 2018 resp.), and imply decidability of the larger alphabet case of the gap non-interactive simulation problem posed by Ghazi, Kamath & Sudan (FOCS 2016).

    Our technique of dimension reduction for low-degree polynomials is simple and can be seen as a generalization of the Johnson-Lindenstrauss lemma and could be of independent interest.
  • Query-to-Communication Lifting for $\mathrm{P}^{\mathrm{NP}}$ [pdf]
    Mika Göös, Pritish Kamath, Toniann Pitassi, Thomas Watson
    Computational Complexity Conference (CCC), 2017
    Abstract. We prove that the $\mathrm{P}^{\mathrm{NP}}$-type query complexity (alternatively, decision list width) of any boolean function $f$ is quadratically related to the $\mathrm{P}^{\mathrm{NP}}$-type communication complexity of a lifted version of $f$. As an application, we show that a certain "product" lower bound method of Impagliazzo and Williams (CCC 2010) fails to capture $\mathrm{P}^{\mathrm{NP}}$ communication complexity up to polynomial factors, which answers a question of Papakonstantinou, Scheder, and Song (CCC 2014).
  • Improved Bounds for Universal One-Bit Compressive Sensing [pdf]
    Jayadev Acharya, Arnab Bhattacharyya, Pritish Kamath
    International Symposium on Information Theory (ISIT), 2017
    Abstract. Unlike compressive sensing where the measurement outputs are assumed to be real-valued and have infinite precision, in one-bit compressive sensing, measurements are quantized to one bit, their signs. In this work, we show how to recover the support of sparse high-dimensional vectors in the one-bit compressive sensing framework with an asymptotically near-optimal number of measurements. We also improve the bounds on the number of measurements for approximately recovering vectors from one-bit compressive sensing measurements. Our results are universal, namely the same measurement scheme works simultaneously for all sparse vectors.

    Our proof of optimality for support recovery is obtained by showing an equivalence between the task of support recovery using 1-bit compressive sensing and a well-studied combinatorial object known as Union Free Families.
  • Compression in a Distributed Setting [pdf]
    Badih Ghazi, Elad Haramaty, Pritish Kamath, Madhu Sudan
    Innovations in Theoretical Computer Science (ITCS), 2017
    Abstract. Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of $K$ players are chosen and tasked to communicate messages drawn from an unknown distribution $Q$. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution $Q$ is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution $P$ and some constant number of bits of information (such as a learning algorithm). Letting $T_\varepsilon$ denote the number of iterations it would take for a typical player to obtain an $\epsilon$-approximation to $Q$ in total variation distance, we ask whether $T_\varepsilon$ iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer.

    We show that a natural uniform algorithm can compress the communication down to an average cost per message of $O(H(Q) + \log (D(P || Q) + O(1))$ in $\widetilde{O}(T_\varepsilon)$ iterations while allowing for $O(\varepsilon)$-error, where $D(\cdot || \cdot)$ denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to $H(Q) + D(P || Q)$ bits, while not requiring $T_\varepsilon\cdot K$ iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of $Q$ and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.
  • Decidability of Non-Interactive Simulation of Joint Distributions [pdf]
    Badih Ghazi, Pritish Kamath, Madhu Sudan
    Foundations of Computer Science (FOCS), 2016
    Abstract. We present decidability results for a sub-class of "non-interactive" simulation problems, a well-studied class of problems in information theory. A non-interactive simulation problem is specified by two distributions $P(x,y)$ and $Q(u,v)$: The goal is to determine if two players, Alice and Bob, that observe sequences $X^n$ and $Y^n$ respectively, where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y)$, can generate pairs $U$ and $V$ respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to $Q(u,v)$. Even when $P$ and $Q$ are extremely simple: e.g., $P$ is uniform on the triples $\{(0,0), (0,1), (1,0)\}$ and $Q$ is a "doubly symmetric binary source", i.e., $U$ and $V$ are uniform $\pm 1$ variables with correlation say $0.49$, it is open if $P$ can simulate $Q$.

    In this work, we show that whenever $P$ is a distribution on a finite domain and $Q$ is a $2 \times 2$ distribution, then the non-interactive simulation problem is decidable: specifically, given $\delta > 0$ the algorithm runs in time bounded by some function of $P$ and $\delta$ and either gives a non-interactive simulation protocol that is $\delta$-close to $Q$ or asserts that no protocol gets $O(\delta)$-close to $Q$. The main challenge to such a result is determining explicit (computable) convergence bounds on the number $n$ of samples that need to be drawn from $P(x,y)$ to get $\delta$-close to $Q$. We invoke contemporary results from the analysis of Boolean functions such as the invariance principle and a regularity lemma to obtain such explicit bounds.
  • Communication complexity of permutation-invariant functions [pdf]
    Badih Ghazi, Pritish Kamath, Madhu Sudan
    Symposium On Discete Algorithms (SODA), 2016
    Abstract. Motivated by the quest for a broader understanding of upper bounds in communication complexity, at least for simple functions, we introduce the class of ''permutation-invariant'' functions. A partial function $f:\{0,1\}^n \times \{0,1\}^n\to \{0,1,?\}$ is permutation-invariant if for every bijection $\pi:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in \{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}^{\pi}, \mathbf{y}^{\pi})$. Most of the commonly studied functions in communication complexity are permutation-invariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions.

    Our work highlights the role of the well-known lower bounds of functions such as 'Set-Disjointness' and 'Indexing', while complementing them with the relatively lesser-known upper bounds for 'Gap-Inner-Product' (from the sketching literature) and 'Sparse-Gap-Inner-Product' (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutation-invariant functions, imperfectly shared randomness results in only a polynomial blow-up in communication complexity after an additive $O(\log \log n)$ overhead.
  • Communication with partial noiseless feedback [pdf]
    Bernhard Haeupler, Pritish Kamath, Ameya Velingker
    International Workshop on Randomization and Computation (RANDOM), 2015
    Abstract. We introduce the notion of one-way communication schemes with partial noiseless feedback. In this setting, Alice wishes to communicate a message to Bob by using a communication scheme that involves sending a sequence of bits over a channel while receiving feedback bits from Bob for $\delta$ fraction of the transmissions. An adversary is allowed to corrupt up to a constant fraction of Alice's transmissions, while the feedback is always uncorrupted. Motivated by questions related to coding for interactive communication, we seek to determine the maximum error rate, as a function of $0 \le \delta \le 1$, such that Alice can send a message to Bob via some protocol with $\delta$ fraction of noiseless feedback. The case $\delta = 1$ corresponds to full feedback, in which the result of Berlekamp ['64] implies that the maximum tolerable error rate is $1/3$, while the case $\delta = 0$ corresponds to no feedback, in which the maximum tolerable error rate is $1/4$, achievable by use of a binary error-correcting code.

    In this work, we show that for any $\delta \in (0,1]$ and $\gamma \in [0, 1/3)$, there exists a randomized communication scheme with noiseless $\delta$-feedback, such that the probability of miscommunication is low, as long as no more than a $\gamma$ fraction of the rounds are corrupted. Moreover, we show that for any $\delta \in (0, 1]$ and $\gamma < f(\delta)$, there exists a deterministic communication scheme with noiseless $\delta$-feedback that always decodes correctly as long as no more than a $\gamma$ fraction of rounds are corrupted. Here $f$ is a monotonically increasing, piecewise linear, continuous function with $f(0) = 1/4$ and $f(1) = 1/3$. Also, the rate of communication in both cases is constant (dependent on $\delta$ and $\gamma$ but independent of the input length).
  • Arithmetic Circuits : A chasm at depth three [pdf] [FOCS slides] [FOCS video] [TCS+ video]
    Ankit Gupta, Pritish Kamath, Neeraj Kayal, Ramprasad Saptharishi
    Foundations of Computer Science (FOCS), 2013 (also in SICOMP and CACM)
    Abstract. We show that, over $\mathbb{Q}$, if an $n$-variate polynomial of degree $d = n^{O(1)}$ is computable by an arithmetic circuit of size $s$ (respectively by an algebraic branching program of size $s$) then it can also be computed by a depth three circuit (i.e. a $\Sigma \Pi \Sigma$-circuit) of size $\exp(O(\sqrt{d \log d \log n \log s}))$ (respectively of size $\exp(O(\sqrt{d \log n \log s}))$). In particular this yields a $\Sigma \Pi \Sigma$ circuit of size $\exp(O(\sqrt{d} \cdot \log d))$ computing the $d \times d$ determinant $\mathrm{Det}_d$. It also means that if we can prove a lower bound of $\exp(\omega(\sqrt{d} \cdot \log^{3/2} d))$ on the size of any $\Sigma \Pi \Sigma$-circuit computing the $d \times d$ permanent ${\rm Perm}_d$ then we get superpolynomial lower bounds for the size of any arithmetic circuit computing ${\rm Perm}_d$. We then give some further results pertaining to derandomizing polynomial identity testing and circuit lower bounds.

    The $\Sigma \Pi \Sigma $ circuits that we construct have the property that (some of) the intermediate polynomials have degree much higher than $d$. Indeed such a counterintuitive construction is unavoidable - it is known that in any $\Sigma \Pi \Sigma$ circuit $C$ computing either ${\rm Det}_d$ or ${\rm Perm}_d$, if every multiplication gate has fanin at most $d$ (or any constant multiple thereof) then $C$ must have size at least $\exp(\Omega(d))$.
  • Approaching the chasm at depth four [pdf]
    Ankit Gupta, Pritish Kamath, Neeraj Kayal, Ramprasad Saptharishi
    Conference on Computational Complexity (CCC), 2013 (also in J.ACM)
    (co-winner of the Best Paper Award)
    Abstract. Agrawal-Vinay (FOCS 2008), Koiran (TCS 2012) and Tavenas (MFCS 2012) have recently shown that an $\exp(\omega(\sqrt{n}\log n))$ lower bound for depth four homogeneous circuits computing the permanent with bottom layer of $\times$ gates having fanin bounded by $\sqrt{n}$ translates to super-polynomial lower bound for general arithmetic circuits computing the permanent. Motivated by this, we examine the complexity of computing the permanent and determinant via such homogeneous depth four circuits with bounded bottom fanin.

    We show here that any homogeneous depth four arithmetic circuit with bottom fanin bounded by $\sqrt{n}$ computing the permanent (or the determinant) must be of size $\exp(\Omega(\sqrt{n}))$.

Before 2013 (Undergraduate Research)

  • Preservation under substructures modulo bounded cores [pdf]
    † Abhisekh Sankaran, Bharat Adsul, Vivek Madan, Pritish Kamath, Supratik Chakraborty
    International Workshop on Logic, Language, Information and Computation (WoLLIC), 2012
    Abstract. We investigate a model-theoretic property that generalizes the classical notion of preservation under substructures. We call this property preservation under substructures modulo bounded cores, and present a syntactic characterization via $\Sigma^0_2$ sentences for properties of arbitrary structures definable by FO sentences. Towards a sharper characterization, we conjecture that the count of existential quantifiers in the $\Sigma^0_2$ sentence equals the size of the smallest bounded core. We show that this conjecture holds for special fragments of FO and also over special classes of structures. We present a (not FO-definable) class of finite structures for which the conjecture fails, but for which the classical Łoś-Tarski preservation theorem holds. As a fallout of our studies, we obtain combinatorial proofs of the Łoś-Tarski theorem for some of the aforementioned cases.
  • Faster algorithms for alternating refinement relations [pdf]
    Krishnendu Chatterjee, Siddhesh Chaubal, Pritish Kamath
    Computer Science and Logic (CSL), 2012
    Abstract. One central issue in the formal design and analysis of reactive systems is the notion of refinement that asks whether all behaviors of the implementation is allowed by the specification. The local interpretation of behavior leads to the notion of simulation. Alternating transition systems (ATSs) provide a general model for composite reactive systems, and the simulation relation for ATSs is known as alternating simulation. The simulation relation for fair transition systems is called fair simulation. In this work our main contributions are as follows:

    1. We present an improved algorithm for fair simulation with Büchi fairness constraints; our algorithm requires $O(n^3 \cdot m)$ time as compared to the previous known $O(n^6)$-time algorithm, where $n$ is the number of states and $m$ is the number of transitions.
    2. We present a game based algorithm for alternating simulation that requires $O(m^2)$-time as compared to the previous known $O((n \cdot m)^2)$-time algorithm, where $n$ is the number of states and $m$ is the size of transition relation.
    3. We present an iterative algorithm for alternating simulation that matches the time complexity of the game based algorithm, but is more space efficient than the game based algorithm.
  • Using dominances for solving the protein family identification problem [pdf]
    Noël Malod-Dognin, Mathilde Le Boudic-Jamin, Pritish Kamath, Rumen Andonov
    Workshop on Algorithms in Bioinformatics (WABI), 2011
    Identification of protein families is a computational biology challenge that needs efficient and reliable methods. Here we introduce the concept of dominance and propose a novel combined approach based on Distance Alignment Search Tool (DAST), which contains an exact algorithm with bounds. Our experiments show that this method successfully finds the most similar proteins in a set without solving all instances.