Publications — Pritish Kamath

Theses

Some Hardness Escalation Results in Computational Complexity Theory [pdf]
Pritish Kamath
PhD. Thesis (MIT), 2019

Abstract. In this thesis, we prove new hardness escalation results in computational complexity theory; a phenomenon where hardness results against seemingly weak models of computation for any problem can be lifted, in a black box manner, to much stronger models of computation by considering a simple gadget composed version of the original problem. For any unsatisfiable CNF formula $F$ that is hard to refute in the Resolution proof system, we show that a gadget-composed version of $F$ is hard to refute in any proof system whose lines are computed by efficient communication protocols. This allows us to prove new lower bounds for:

Monotone Circuit Size. We get an exponential lower bound for an explicit monotone function computable by linear sized monotone span programs and also in (non-monotone) $\mathsf{NC}^2$.
Real Monotone Circuit Size. Our proof technique extends to real communication protocols, which yields similar lower bounds against real monotone circuits.
Cutting Planes Length. We get exponential lower bound for an explicit CNF contradiction that is refutable with logarithmic Nullstellensatz degree.

Finally, we describe an intimate connection between computational models and communication complexity analogs of the sub-classes of $\mathsf{TFNP}$, the class of all total search problems in $\mathsf{NP}$. We show that the communication analog of $\mathsf{PPA}_p$ captures span programs over $\mathbb{F}_p$ for any prime $p$. This complements previously known results that communication $\mathsf{FP}$ captures formulas (Karchmer--Wigderson, 1988) and that communication $\mathsf{PLS}$ captures circuits (Razborov, 1995).

Communication Complexity of Permutation-Invariant Functions [pdf]
Pritish Kamath
SM Thesis (MIT), 2015

Abstract. Motivated by the quest for a broader understanding of communication complexity of simple functions, we introduce the class of ``permutation-invariant'' functions. A partial function $f:\{0,1\}^n \times \{0,1\}^n\to \{0,1,?\}$ is permutation-invariant if for every bijection $\pi:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in \{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}^{\pi}, \mathbf{y}^{\pi})$. Most of the commonly studied functions in communication complexity are permutation-invariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions.

Our work highlights the role of the well-known lower bounds of functions such as Set-Disjointness and Indexing, while complementing them with the relatively lesser-known upper bounds for Gap-Inner-Product (from the sketching literature) and Sparse-Gap-Inner-Product (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutation-invariant functions, imperfectly shared randomness results in only a polynomial blow-up in communication complexity after an additive $O(\log \log n)$ overhead.

Publications and Pre-prints

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [pdf]
Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang
Manuscript, 2024

Abstract. Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz) contexts. We observe that simple inference-time mitigation methods offer only limited improvement. On the other hand, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is publicly available at https://github.com/google-research/crosslingual-knowledge-barriers.

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [pdf]
Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
Manuscript, 2024

Abstract. Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are `almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Differentially Private Optimization with Sparse Gradients [pdf]
Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Manuscript, 2024

Abstract. Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Finally, we study the approximation of stationary points for the empirical loss in approximate-DP optimization and obtain rates that depend on sparsity instead of dimension, modulo polylogarithmic factors.

How Private are DP-SGD Implementations? [pdf]
Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
International Conference on Machine Learning (ICML), 2024
(Selected for Oral Presentation)

Abstract. We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon
International Conference on Machine Learning (ICML), 2024

Abstract. In this work, we give a new technique for analyzing individualized privacy accounting via the following simple observation: if an algorithm is one-sided add-DP, then its subsampled variant satisfies two-sided DP. From this, we obtain several improved algorithms for private combinatorial optimization problems, including decomposable submodular maximization and set cover. Our error guarantees are asymptotically tight and our algorithm satisfies pure-DP while previously known algorithms (Gupta et al., 2010; Chaturvedi et al., 2021) are approximate-DP. We also show an application of our technique beyond combinatorial optimization by giving a pure-DP algorithm for the shifting heavy hitter problem in a stream; previously, only an approximateDP algorithm was known (Kaplan et al., 2021; Cohen & Lyu, 2023).

On Convex Optimization with Semi-Sensitive Features [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, Chiyuan Zhang
Conference on Learning Theory (COLT), 2024

Abstract. We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give improved upper and lower bounds on the excess risk for DP-ERM. In particular, we show that the error only scales polylogarithmically in terms of the sensitive domain size, improving upon previous results that scale polynomially in the sensitive domain size (Ghazi et al., 2021).

Learning Neural Networks with Sparse Activations [pdf]
Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath, Raghu Meka
Conference on Learning Theory (COLT), 2024

Abstract. A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.

Summary Reports Optimization in the Privacy Sandbox Attribution Reporting API [pdf]
Hidayet Aksu, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon, Avinash V Varadarajan
Proceedings of Privacy Enhancing Technologies (PoPETS), 2024

Abstract. The Privacy Sandbox Attribution Reporting API has been recently deployed by Google Chrome to support the basic advertising functionality of attribution reporting (aka conversion measurement) after deprecation of third-party cookies. The API implements a collection of privacy-enhancing guardrails including contribution bounding and noise injection. It also offers flexibility for the analyst to allocate the contribution budget. In this work, we present methods for optimizing the allocation of the contribution budget for summary reports from the Attribution Reporting API. We evaluate them on real-world datasets as well as on a synthetic data model that we find to accurately capture real-world conversion data. Our results demonstrate that optimizing the parameters that can be set by the analyst can significantly improve the utility achieved by querying the API while satisfying the same privacy bounds.

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features [pdf]
Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang
Privacy-Preserving Artificial Intelligence (PPAI) workshop, 2024

Abstract. Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).

LabelDP-Pro: Learning with Label Differential Privacy via Projections [pdf]
Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang
International Conference on Learning Representations (ICLR), 2024

Abstract. Label differentially private (label DP) algorithms seek to preserve the privacy of the labels in a training dataset in settings where the features are known to the adversary. In this work, we study a new family of label DP training algorithms. Unlike most prior label DP algorithms that have been based on label randomization, our algorithm naturally leverages the power of the central model of DP. It interleaves gradient projection operations with private stochastic gradient descent steps in order to improve the utility of the trained model while guaranteeing the privacy of the labels. We show that such projection-based algorithms can be made practical and that they improve on the state-of-the art for label DP training in the high-privacy regime. We complement our empirical evaluation with theoretical results shedding light on the efficacy of our method through the lens of bias-variance trade-offs.

User-Level Differential Privacy With Few Examples Per User [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, Chiyuan Zhang
Neural Information Processing Systems (NeurIPS), 2023
(Selected for Oral Presentation)

Abstract. Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the example-scarce regime, where each user has only a few examples, and obtain the following results:
1. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e.g., for PAC learning.
2. For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.

Optimal Unbiased Randomizers for Regression with Label Differential Privacy [pdf]
Ashwinkumar Badanidiyuru, Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash V Varadarajan, Chiyuan Zhang
Neural Information Processing Systems (NeurIPS), 2023

Abstract. We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better label randomizers depending on a privately estimated prior distribution over the labels. We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets, highlighting the importance of reducing bias when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal unbiased randomizers.

On Computing Pairwise Statistics with Local Differential Privacy [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon
Neural Information Processing Systems (NeurIPS), 2023

Abstract. We study the problem of computing pairwise statistics, i.e., ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's $\tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.

Optimizing Hierarchical Queries for the Attribution Reporting API [pdf]
Matthew Dawson, Badih Ghazi, Pritish Kamath, Kapil Kumar, Ravi Kumar, Bo Luan, Pasin Manurangsi, Nishanth Mundru, Harikesh Nair, Adam Sealfon, Shengyu Zhu
AdKDD Workshop, 2023

Abstract. We study the task of performing hierarchical queries based on summary reports from the {\em Attribution Reporting API} for ad conversion measurement. We demonstrate that methods from optimization and differential privacy can help cope with the noise introduced by privacy guardrails in the API. In particular, we present algorithms for (i) denoising the API outputs and ensuring consistency across different levels of the tree, and (ii) optimizing the privacy budget across different levels of the tree. We provide an experimental evaluation of the proposed algorithms on public datasets.

On User-Level Private Convex Optimization [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Raghu Meka, Pasin Manurangsi, Chiyuan Zhang
International Conference on Machine Learning (ICML), 2023

Abstract. We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees. The convergence rates of this mechanism are similar to those in the prior work of Levy et al. (2021); Narayanan et al. (2022), but with two important improvements. Our mechanism does not require any smoothness assumptions on the loss. Furthermore, our bounds are also the first where the minimum number of users needed for user-level privacy has no dependence on the dimension and only a logarithmic dependence on the desired excess error. The main idea underlying the new mechanism is to show that the optimizers of strongly convex losses have low local deletion sensitivity, along with an output perturbation method for functions with low local deletion sensitivity, which could be of independent interest.

Ticketed Learning-Unlearning Schemes [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Ayush Sekhari, Chiyuan Zhang
Conference on Learning Theory (COLT), 2023

Abstract. We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when learning from scratch on the surviving examples. We propose a new ticketed model for learning--unlearning wherein the learning algorithm can send back additional information in the form of a small-sized (encrypted) ``ticket'' to each participating training example, in addition to retaining a small amount of ``central'' information for later. Subsequently, the examples that wish to be unlearnt present their tickets to the unlearning algorithm, which additionally uses the central information to return a new predictor. We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes, including thresholds, parities, intersection-closed classes, among others. En route, we introduce the count-to-zero problem, where during unlearning, the goal is to simply know if there are any examples that survived. We give a ticketed learning--unlearning scheme for this problem that relies on the construction of Sperner families with certain properties, which might be of independent interest.

Towards Separating Computational and Statistical Differential Privacy [pdf]
Badih Ghazi, Rahul Ilango, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Foundations of Computer Science (FOCS), 2023

Abstract. Computational differential privacy (CDP) is a natural relaxation of the standard notion of (statistical) differential privacy (SDP) proposed by Beimel, Nissim, and Omri (CRYPTO 2008) and Mironov, Pandey, Reingold, and Vadhan (CRYPTO 2009). In contrast to SDP, CDP only requires privacy guarantees to hold against computationally-bounded adversaries rather than computationally-unbounded statistical adversaries. Despite the question being raised explicitly in several works (e.g., Bun, Chen, and Vadhan, TCC 2016), it has remained tantalizingly open whether there is any task achievable with the CDP notion but not the SDP notion. Even a candidate such task is unknown. Indeed, it is even unclear what the truth could be! In this work, we give the first construction of a task achievable with the CDP notion but not the SDP notion, under the following strong but plausible cryptographic assumptions: (1) Non-Interactive Witness Indistinguishable Proofs, (2) Laconic Collision-Resistant Keyless Hash Functions, (3) Differing-Inputs Obfuscation for Public-Coin Samplers. In particular, we construct a task for which there exists an $\varepsilon$-CDP mechanism with $\varepsilon = O(1)$ achieving $1-o(1)$ utility, but any $(\varepsilon, \delta)$-SDP mechanism, including computationally-unbounded ones, that achieves a constant utility must use either a super-constant $\varepsilon$ or an inverse-polynomially large $\delta$. To prove this, we introduce a new approach for showing that a mechanism satisfies CDP: first we show that a mechanism is "private" against a certain class of decision tree adversaries, and then we use cryptographic constructions to "lift" this into privacy against computationally bounded adversaries. We believe this approach could be useful to devise further tasks separating CDP from SDP.

On Differentially Private Counting on Trees [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Kewen Wu
International Colloquium on Automata, Languages, and Programming (ICALP), 2023

Abstract. We study the problem of performing counting queries at different levels in hierarchical structures while preserving individuals' privacy. Motivated by applications, we propose a new error measure for this problem by considering a combination of multiplicative and additive approximation to the query results. We examine known mechanisms in differential privacy (DP) and prove their optimality, under this measure, in the pure-DP setting. In the approximate-DP setting, we design new algorithms achieving significant improvements over known ones.

Regression with Label Differential Privacy [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash Varadarajan, Chiyuan Zhang
International Conference on Learning Representations (ICLR), 2023

Abstract. We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.

Private Ad Modeling with DP-SGD [pdf]
Carson Denison, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash V Varadarajan, Chiyuan Zhang
AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI), 2023

Abstract. A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their high class imbalance and sparse gradient updates. In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events, and evaluate their privacy-utility trade-off on real-world datasets. Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks.

Anonymized Histograms in Intermediate Privacy Models [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Neural Information Processing Systems (NeurIPS), 2022

Abstract. We study the problem of privately computing the anonymized histogram (a.k.a. unattributed histogram), which is defined as the histogram without item labels. Previous works have provided algorithms with $\ell_1$- and $\ell_2^2$-errors of $O_{\varepsilon}(\sqrt{n})$ in the central model of differential privacy (DP). In this work, we provide an algorithm with a nearly matching error guarantee of $\tilde{O}_{\varepsilon}(\sqrt{n})$ in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in privately estimating symmetric properties of distributions such as entropy, support coverage, and support size.

Private Isotonic Regression [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Neural Information Processing Systems (NeurIPS), 2022

Abstract. In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log |\mathcal{X}|/n$, where $\mathrm{width}(X)$ is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly $(\mathrm{width}(X) + \log |\mathcal{X}|)/n$, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for $\ell_1$ and $\ell_2^2$ losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.

Understanding the Eluder Dimension [pdf]
† Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro
Neural Information Processing Systems (NeurIPS), 2022

Abstract. We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation. First, we study the relationship between the eluder dimension for a function class and a generalized notion of rank, defined for any monotone "activation" $\sigma: \mathbb{R} \to \mathbb{R}$, which corresponds to the minimal dimension required to represent the class as a generalized linear model. It is known that when $\sigma$ has derivatives bounded away from $0$, $\sigma$-rank gives rise to an upper bound on eluder dimension for any function class; we show however that eluder dimension can be exponentially smaller than $\sigma$-rank. We also show that the condition on the derivative is necessary; namely, when $\sigma$ is the relu activation, the eluder dimension can be exponentially larger than $\sigma$-rank. For binary-valued function classes, we obtain a characterization of the eluder dimension in terms of star number and threshold dimension, quantities which are relevant in active learning and online learning respectively.

Do More Negative Samples Necessarily Hurt In Contrastive Learning? [pdf]
Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath
International Conference on Machine Learning (ICML), 2022
(Selected for Long Presentation)

Abstract. Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.

Faster Privacy Accounting via Evolving Discretization [pdf]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
International Conference on Machine Learning (ICML), 2022

Abstract. We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms. Our algorithm achieves a running time and memory usage of $\mathrm{polylog}(k)$ for the task of self-composing a mechanism, from a broad class of mechanisms, $k$ times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent. By comparison, recent work by \citet{gopi2021numerical} has obtained a running time of $\widetilde{O}(\sqrt{k})$ for the same task. Our approach extends to the case of composing $k$ different mechanisms in the same class, improving upon their running time and memory usage from $\widetilde{O}(k^{1.5})$ to $\widetilde{O}(k)$.

Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions [pdf]
Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi
Privacy Enhancing Technologies Symposium (PETS), 2022

Abstract. The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP). Recent work [Meiser et al. 2018, Koskela et al. 2020, 2021a, 2021b] has shown that PLD-based accounting allows for tighter $(\varepsilon, \delta)$-DP guarantees for many popular mechanisms compared to other known methods. A key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.

We present a novel approach to this problem. Our approach supports both pessimistic estimation, which overestimates the hockey-stick divergence (i.e., $\delta$) for any value of $\varepsilon$, and optimistic estimation, which underestimates the hockey-stick divergence. Moreover, we show that our pessimistic estimate is the best possible among all pessimistic estimates. Moreover, this discrete PLD, when used in compositions, would also yield pessimistic/optimistic estimates respectively of the hockey-stick divergence. Experimental evaluation shows that our approach can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than an existing method [Meiser et al. 2018].

Circuits Resilient to Short-Circuit Errors [pdf]
Klim Efremenko, Bernhard Haeupler, Yael Tauman Kalai, Pritish Kamath, Gillat Kol, Nicolas Resch, Raghuvansh Saxena
Symposium on Theory of Computing (STOC), 2022

Abstract. Given a Boolean circuit $C$, we wish to convert it to a circuit $C$ that computes the same function as $C$ even if some of its gates suffer from adversarial short circuit errors, i.e., their output is replaced by the value of one of their inputs [KLM97]. Can we design such a resilient circuit $C$ whose size is roughly comparable to that of $C$? Prior work [KLR12, BEGY19] gave a positive answer for the special case where $C$ is a formula.

We study the general case and show that any Boolean circuit $C$ of size $s$ can be converted to a new circuit $C'$ of quasi-polynomial size $s^{O(\log s)}$ that computes the same function as $C$ even if a $1/51$ fraction of the gates on any root-to-leaf path in $C'$ are short circuited. Moreover, if the original circuit $C$ is a formula, the resilient circuit $C'$ is of near-linear size $s^{1+\varepsilon}$. The construction of our resilient circuits utilizes the connection between circuits and DAG-like communication protocols [Raz95, Pud10, Sok17], originally introduced in the context of proof complexity.

On the Power of Differentiable Learning versus PAC and SQ Learning [pdf]
Emmanuel Abbe, Eran Malach, Pritish Kamath, Colin Sandon, Nathan Srebro
Neural Information Processing Systems (NeurIPS), 2021
(Selected for Spotlight presentation)

Abstract. We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $\rho$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b\rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $\rho$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b\rho^2$ is large enough, the power of SGD is equivalent to that of SQ learning.

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels [pdf]
† Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro
International Conference on Machine Learning (ICML), 2021

Abstract. We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

Does Invariant Risk Minimization Capture Invariance? [pdf]
† Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, Nathan Srebro
Artificial Intelligence and Statistics Conference (AISTATS), 2021
(Selected for Oral presentation)

Abstract. We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM. This can lead to worse generalization on new environments, even when compared to unconstrained ERM. The issue stems from a significant gap between the linear variant (as in their concrete method IRMv1) and the full non-linear IRM formulation. Additionally, even when capturing the "right" invariances, we show that it is possible for IRM to learn a sub-optimal predictor, due to the loss function not being invariant across environments. The issues arise even when measuring invariance on the population distributions, but are exacerbated by the fact that IRM is extremely fragile to sampling.

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity [pdf]
Pritish Kamath, Omar Montasser, Nathan Srebro
Conference on Learning Theory (COLT), 2020

Abstract. We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods.

Optimality of Correlated Sampling Strategies [pdf] [ToC version]
Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan
Theory of Computing (TOC), 2020

Abstract. In the correlated sampling problem, two players are given probability distributions $P$ and $Q$, respectively, over the same finite set, with access to shared randomness. Without any communication, the two players are each required to output an element sampled according to their respective distributions, while trying to minimize the probability that their outputs disagree. A well known strategy due to Kleinberg-Tardos and Holenstein, with a close variant (for a similar problem) due to Broder, solves this task with disagreement probability at most $2\delta/(1+\delta)$, where $\delta$ is the total variation distance between $P$ and $Q$. This strategy has been used in several different contexts, including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography.

In this paper, we give a surprisingly simple proof that this strategy is essentially optimal. Specifically, for every $\delta \in (0,1)$, we show that any correlated sampling strategy incurs a disagreement probability of essentially $2\delta/(1+\delta)$ on some inputs $P$ and $Q$ with total variation distance at most $\delta$. This partially answers a recent question of Rivest.

Our proof is based on studying a new problem that we call constrained agreement. Here, the two players are given subsets $A \subseteq [n]$ and $B \subseteq [n]$, respectively, and their goal is to output an element $i \in A$ and $j \in B$, respectively, while minimizing the probability that $i \ne j$. We prove tight bounds for this question, which in turn imply tight bounds for correlated sampling. Though we settle basic questions about the two problems, our formulation leads to more fine-grained questions that remain open.

On the Complexity of Modulo-$q$ Arguments and the Chevalley-Warning Theorem [pdf]
Mika Göös, Pritish Kamath, Katerina Sotiraki, Manolis Zampetakis
Computational Complexity Conference (CCC), 2020

Abstract. We study the search problem class $\mathsf{PPA}_q$ defined as a modulo-$q$ analog of the well-known polynomial parity argument class $\mathsf{PPA}$ introduced by Papadimitriou '94. Our first result shows that this class can be characterized in terms of $\mathsf{PPA}_p$ for prime $p$.

Our main result is to establish that an explicit version of a search problem associated to the Chevalley--Warning theorem is complete for $\mathsf{PPA}_p$ for prime $p$. This problem is natural in that it does not explicitly involve circuits as part of the input. It is the first such complete problem for $\mathsf{PPA}_p$ when $p \ge 3$.

Finally we discuss connections between Chevalley-Warning theorem and the well-studied short integer solution problem and survey the structural properties of $\mathsf{PPA}_q$.

Limits on the Efficiency of (Ring) LWE based Non-Interactive Key Exchange [pdf]
Siyao Guo, Pritish Kamath, Alon Rosen, Katerina Sotiraki
Public Key Cryptography (PKC), 2020

Abstract.
$\mathsf{LWE}$ based key-exchange protocols lie at the heart of post-quantum public-key cryptography. However, all existing protocols either lack the non-interactive nature of Diffie-Hellman key-exchange or polynomial $\mathsf{LWE}$-modulus, resulting in unwanted efficiency overhead. We study the possibility of designing non-interactive $\mathsf{LWE}$-based protocols with polynomial $\mathsf{LWE}$-modulus. To this end,

We identify and formalize simple non-interactive and polynomial $\mathsf{LWE}$-modulus variants of existing protocols, where Alice and Bob simultaneously exchange one or more (ring) $\mathsf{LWE}$ samples with polynomial $\mathsf{LWE}$-modulus and then run individual key reconciliation functions to obtain the shared key.
We point out central barriers and show that such non-interactive key-exchange protocols are impossible if:
1. the reconciliation functions first compute the inner product of the received $\mathsf{LWE}$ sample with their private $\mathsf{LWE}$ secret. This impossibility is information theoretic.
2. One of the reconciliation functions does not depend on the error of the transmitted $\mathsf{LWE}$ sample. This impossibility assumes hardness of $\mathsf{LWE}$.
We give further evidence that progress in either direction, of giving an $\mathsf{LWE}$-based non-interactive key exchange protocol or proving impossibility of one will lead to progress on some other well-studied questions in cryptography.

Overall, our results show possibilities and challenges in designing simple (ring) $\mathsf{LWE}$-based non-interactive key exchange protocols.

Adventures in Monotone Complexity and $\mathsf{TFNP}$ [pdf]
Mika Göös, Pritish Kamath, Robert Robere, Dmitry Sokolov
Innovations in Theoretical Computer Science (ITCS), 2019
(invited talk at FOCS 2018 workshop on TFNP)

Abstract.
Separations: We introduce a monotone variant of $\mathrm{Xor}\text{-}\mathrm{SAT}$ and show it has exponential monotone circuit complexity. Since $\mathrm{Xor}\text{-}\mathrm{SAT}$ is in $\mathsf{NC}^2$, this improves qualitatively on the monotone vs. non-monotone separation of Tardos (1988). We also show that monotone span programs over $\mathbb{R}$ can be exponentially more powerful than over finite fields. These results can be interpreted as separating subclasses of $\mathsf{TFNP}$ in communication complexity.

Characterizations: We show that the communication (resp. query) analogue of $\mathsf{PPA}$ (subclass of $\mathsf{TFNP}$) captures span programs over $\mathbb{F}_2$ (resp. Nullstellensatz degree over $\mathbb{F}_2$). Previously, it was known that communication $\mathsf{FP}$ captures formulas (Karchmer-Wigderson, 1988) and that communication $\mathsf{PLS}$ captures circuits (Razborov, 1995).

Bayesian Inference of Temporal Task Specifications from Demonstrations [pdf]
† Ankit Shah, Pritish Kamath, Shen Li, Julie Shah
Neural Information Processing Systems (NeurIPS), 2018

Abstract. When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task. Prior research into learning from demonstrations (LfD) has failed to capture this notion of the acceptability of an execution; meanwhile, temporal logics provide a flexible language for expressing task specifications. Inspired by this, we present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula. We incorporate methods from probabilistic programming to define our priors, along with a domain-independent likelihood function to enable sampling-based inference. We demonstrate the efficacy of our model for inferring specifications with over 90% similarity between the inferred specification and the ground truth, both within a synthetic domain and a real-world table setting task.

Monotone Circuit Lower Bounds from Resolution [pdf]
Ankit Garg, Mika Göös, Pritish Kamath, Dmitry Sokolov
Symposium on Theory Of Computing (STOC), 2018

Abstract. For any unsatisfiable CNF formula $F$ that is hard to refute in the Resolution proof system, we show that a gadget-composed version of $F$ is hard to refute in any proof system whose lines are computed by efficient communication protocols---or, equivalently, that a monotone function associated with $F$ has large monotone circuit complexity. Our result extends to monotone real circuits, which yields new lower bounds for the Cutting Planes proof system.

Dimension Reduction for Polynomials over Gaussian Space and Applications [shorter][longer]
Badih Ghazi, Pritish Kamath, Prasad Raghavendra
Computational Complexity Conference (CCC), 2018

Abstract. We introduce a new technique for reducing the dimension of the ambient space of low-degree polynomials in the Gaussian space while preserving their relative correlation structure. As an application, we obtain an explicit upper bound on the dimension of an $\varepsilon$-optimal noise-stable Gaussian partition. In fact, we address the more general problem of upper bounding the number of samples needed to $\varepsilon$-approximate any joint distribution that can be non-interactively simulated from a correlated Gaussian source. Our results significantly improve (from Ackermann-like to "merely" exponential) the upper bounds recently proved on the above problems by De, Mossel & Neeman (CCC 2017, SODA 2018 resp.), and imply decidability of the larger alphabet case of the gap non-interactive simulation problem posed by Ghazi, Kamath & Sudan (FOCS 2016).

Our technique of dimension reduction for low-degree polynomials is simple and can be seen as a generalization of the Johnson-Lindenstrauss lemma and could be of independent interest.

Query-to-Communication Lifting for $\mathrm{P}^{\mathrm{NP}}$ [pdf]
Mika Göös, Pritish Kamath, Toniann Pitassi, Thomas Watson
Computational Complexity Conference (CCC), 2017

Abstract. We prove that the $\mathrm{P}^{\mathrm{NP}}$-type query complexity (alternatively, decision list width) of any boolean function $f$ is quadratically related to the $\mathrm{P}^{\mathrm{NP}}$-type communication complexity of a lifted version of $f$. As an application, we show that a certain "product" lower bound method of Impagliazzo and Williams (CCC 2010) fails to capture $\mathrm{P}^{\mathrm{NP}}$ communication complexity up to polynomial factors, which answers a question of Papakonstantinou, Scheder, and Song (CCC 2014).

Improved Bounds for Universal One-Bit Compressive Sensing [pdf]
Jayadev Acharya, Arnab Bhattacharyya, Pritish Kamath
International Symposium on Information Theory (ISIT), 2017

Abstract. Unlike compressive sensing where the measurement outputs are assumed to be real-valued and have infinite precision, in one-bit compressive sensing, measurements are quantized to one bit, their signs. In this work, we show how to recover the support of sparse high-dimensional vectors in the one-bit compressive sensing framework with an asymptotically near-optimal number of measurements. We also improve the bounds on the number of measurements for approximately recovering vectors from one-bit compressive sensing measurements. Our results are universal, namely the same measurement scheme works simultaneously for all sparse vectors.

Our proof of optimality for support recovery is obtained by showing an equivalence between the task of support recovery using 1-bit compressive sensing and a well-studied combinatorial object known as Union Free Families.

Compression in a Distributed Setting [pdf]
Badih Ghazi, Elad Haramaty, Pritish Kamath, Madhu Sudan
Innovations in Theoretical Computer Science (ITCS), 2017

Abstract. Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of $K$ players are chosen and tasked to communicate messages drawn from an unknown distribution $Q$. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution $Q$ is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution $P$ and some constant number of bits of information (such as a learning algorithm). Letting $T_\varepsilon$ denote the number of iterations it would take for a typical player to obtain an $\epsilon$-approximation to $Q$ in total variation distance, we ask whether $T_\varepsilon$ iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer.

We show that a natural uniform algorithm can compress the communication down to an average cost per message of $O(H(Q) + \log (D(P || Q) + O(1))$ in $\widetilde{O}(T_\varepsilon)$ iterations while allowing for $O(\varepsilon)$-error, where $D(\cdot || \cdot)$ denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to $H(Q) + D(P || Q)$ bits, while not requiring $T_\varepsilon\cdot K$ iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of $Q$ and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.

Decidability of Non-Interactive Simulation of Joint Distributions [pdf]
Badih Ghazi, Pritish Kamath, Madhu Sudan
Foundations of Computer Science (FOCS), 2016

Abstract. We present decidability results for a sub-class of "non-interactive" simulation problems, a well-studied class of problems in information theory. A non-interactive simulation problem is specified by two distributions $P(x,y)$ and $Q(u,v)$: The goal is to determine if two players, Alice and Bob, that observe sequences $X^n$ and $Y^n$ respectively, where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y)$, can generate pairs $U$ and $V$ respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to $Q(u,v)$. Even when $P$ and $Q$ are extremely simple: e.g., $P$ is uniform on the triples $\{(0,0), (0,1), (1,0)\}$ and $Q$ is a "doubly symmetric binary source", i.e., $U$ and $V$ are uniform $\pm 1$ variables with correlation say $0.49$, it is open if $P$ can simulate $Q$.

In this work, we show that whenever $P$ is a distribution on a finite domain and $Q$ is a $2 \times 2$ distribution, then the non-interactive simulation problem is decidable: specifically, given $\delta > 0$ the algorithm runs in time bounded by some function of $P$ and $\delta$ and either gives a non-interactive simulation protocol that is $\delta$-close to $Q$ or asserts that no protocol gets $O(\delta)$-close to $Q$. The main challenge to such a result is determining explicit (computable) convergence bounds on the number $n$ of samples that need to be drawn from $P(x,y)$ to get $\delta$-close to $Q$. We invoke contemporary results from the analysis of Boolean functions such as the invariance principle and a regularity lemma to obtain such explicit bounds.

Communication complexity of permutation-invariant functions [pdf]
Badih Ghazi, Pritish Kamath, Madhu Sudan
Symposium On Discete Algorithms (SODA), 2016

Abstract. Motivated by the quest for a broader understanding of upper bounds in communication complexity, at least for simple functions, we introduce the class of ''permutation-invariant'' functions. A partial function $f:\{0,1\}^n \times \{0,1\}^n\to \{0,1,?\}$ is permutation-invariant if for every bijection $\pi:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in \{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}^{\pi}, \mathbf{y}^{\pi})$. Most of the commonly studied functions in communication complexity are permutation-invariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions.

Our work highlights the role of the well-known lower bounds of functions such as 'Set-Disjointness' and 'Indexing', while complementing them with the relatively lesser-known upper bounds for 'Gap-Inner-Product' (from the sketching literature) and 'Sparse-Gap-Inner-Product' (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutation-invariant functions, imperfectly shared randomness results in only a polynomial blow-up in communication complexity after an additive $O(\log \log n)$ overhead.

Communication with partial noiseless feedback [pdf]
Bernhard Haeupler, Pritish Kamath, Ameya Velingker
International Workshop on Randomization and Computation (RANDOM), 2015

Abstract. We introduce the notion of one-way communication schemes with partial noiseless feedback. In this setting, Alice wishes to communicate a message to Bob by using a communication scheme that involves sending a sequence of bits over a channel while receiving feedback bits from Bob for $\delta$ fraction of the transmissions. An adversary is allowed to corrupt up to a constant fraction of Alice's transmissions, while the feedback is always uncorrupted. Motivated by questions related to coding for interactive communication, we seek to determine the maximum error rate, as a function of $0 \le \delta \le 1$, such that Alice can send a message to Bob via some protocol with $\delta$ fraction of noiseless feedback. The case $\delta = 1$ corresponds to full feedback, in which the result of Berlekamp ['64] implies that the maximum tolerable error rate is $1/3$, while the case $\delta = 0$ corresponds to no feedback, in which the maximum tolerable error rate is $1/4$, achievable by use of a binary error-correcting code.

In this work, we show that for any $\delta \in (0,1]$ and $\gamma \in [0, 1/3)$, there exists a randomized communication scheme with noiseless $\delta$-feedback, such that the probability of miscommunication is low, as long as no more than a $\gamma$ fraction of the rounds are corrupted. Moreover, we show that for any $\delta \in (0, 1]$ and $\gamma < f(\delta)$, there exists a deterministic communication scheme with noiseless $\delta$-feedback that always decodes correctly as long as no more than a $\gamma$ fraction of rounds are corrupted. Here $f$ is a monotonically increasing, piecewise linear, continuous function with $f(0) = 1/4$ and $f(1) = 1/3$. Also, the rate of communication in both cases is constant (dependent on $\delta$ and $\gamma$ but independent of the input length).

Arithmetic Circuits : A chasm at depth three [pdf] [FOCS slides] [FOCS video] [TCS+ video]
Ankit Gupta, Pritish Kamath, Neeraj Kayal, Ramprasad Saptharishi
Foundations of Computer Science (FOCS), 2013 (also in SICOMP and CACM)

Abstract. We show that, over $\mathbb{Q}$, if an $n$-variate polynomial of degree $d = n^{O(1)}$ is computable by an arithmetic circuit of size $s$ (respectively by an algebraic branching program of size $s$) then it can also be computed by a depth three circuit (i.e. a $\Sigma \Pi \Sigma$-circuit) of size $\exp(O(\sqrt{d \log d \log n \log s}))$ (respectively of size $\exp(O(\sqrt{d \log n \log s}))$). In particular this yields a $\Sigma \Pi \Sigma$ circuit of size $\exp(O(\sqrt{d} \cdot \log d))$ computing the $d \times d$ determinant $\mathrm{Det}_d$. It also means that if we can prove a lower bound of $\exp(\omega(\sqrt{d} \cdot \log^{3/2} d))$ on the size of any $\Sigma \Pi \Sigma$-circuit computing the $d \times d$ permanent ${\rm Perm}_d$ then we get superpolynomial lower bounds for the size of any arithmetic circuit computing ${\rm Perm}_d$. We then give some further results pertaining to derandomizing polynomial identity testing and circuit lower bounds.

The $\Sigma \Pi \Sigma $ circuits that we construct have the property that (some of) the intermediate polynomials have degree much higher than $d$. Indeed such a counterintuitive construction is unavoidable - it is known that in any $\Sigma \Pi \Sigma$ circuit $C$ computing either ${\rm Det}_d$ or ${\rm Perm}_d$, if every multiplication gate has fanin at most $d$ (or any constant multiple thereof) then $C$ must have size at least $\exp(\Omega(d))$.

Approaching the chasm at depth four [pdf]
Ankit Gupta, Pritish Kamath, Neeraj Kayal, Ramprasad Saptharishi
Conference on Computational Complexity (CCC), 2013 (also in J.ACM)
(co-winner of the Best Paper Award)

Abstract. Agrawal-Vinay (FOCS 2008), Koiran (TCS 2012) and Tavenas (MFCS 2012) have recently shown that an $\exp(\omega(\sqrt{n}\log n))$ lower bound for depth four homogeneous circuits computing the permanent with bottom layer of $\times$ gates having fanin bounded by $\sqrt{n}$ translates to super-polynomial lower bound for general arithmetic circuits computing the permanent. Motivated by this, we examine the complexity of computing the permanent and determinant via such homogeneous depth four circuits with bounded bottom fanin.

We show here that any homogeneous depth four arithmetic circuit with bottom fanin bounded by $\sqrt{n}$ computing the permanent (or the determinant) must be of size $\exp(\Omega(\sqrt{n}))$.

Before 2013 (Undergraduate Research)

Preservation under substructures modulo bounded cores [pdf]
† Abhisekh Sankaran, Bharat Adsul, Vivek Madan, Pritish Kamath, Supratik Chakraborty
International Workshop on Logic, Language, Information and Computation (WoLLIC), 2012

Abstract. We investigate a model-theoretic property that generalizes the classical notion of preservation under substructures. We call this property preservation under substructures modulo bounded cores, and present a syntactic characterization via $\Sigma^0_2$ sentences for properties of arbitrary structures definable by FO sentences. Towards a sharper characterization, we conjecture that the count of existential quantifiers in the $\Sigma^0_2$ sentence equals the size of the smallest bounded core. We show that this conjecture holds for special fragments of FO and also over special classes of structures. We present a (not FO-definable) class of finite structures for which the conjecture fails, but for which the classical Łoś-Tarski preservation theorem holds. As a fallout of our studies, we obtain combinatorial proofs of the Łoś-Tarski theorem for some of the aforementioned cases.

Faster algorithms for alternating refinement relations [pdf]
Krishnendu Chatterjee, Siddhesh Chaubal, Pritish Kamath
Computer Science and Logic (CSL), 2012

Abstract. One central issue in the formal design and analysis of reactive systems is the notion of refinement that asks whether all behaviors of the implementation is allowed by the specification. The local interpretation of behavior leads to the notion of simulation. Alternating transition systems (ATSs) provide a general model for composite reactive systems, and the simulation relation for ATSs is known as alternating simulation. The simulation relation for fair transition systems is called fair simulation. In this work our main contributions are as follows:

We present an improved algorithm for fair simulation with Büchi fairness constraints; our algorithm requires $O(n^3 \cdot m)$ time as compared to the previous known $O(n^6)$-time algorithm, where $n$ is the number of states and $m$ is the number of transitions.
We present a game based algorithm for alternating simulation that requires $O(m^2)$-time as compared to the previous known $O((n \cdot m)^2)$-time algorithm, where $n$ is the number of states and $m$ is the size of transition relation.
We present an iterative algorithm for alternating simulation that matches the time complexity of the game based algorithm, but is more space efficient than the game based algorithm.

Using dominances for solving the protein family identification problem [pdf]
† Noël Malod-Dognin, Mathilde Le Boudic-Jamin, Pritish Kamath, Rumen Andonov
Workshop on Algorithms in Bioinformatics (WABI), 2011

Identification of protein families is a computational biology challenge that needs efficient and reliable methods. Here we introduce the concept of dominance and propose a novel combined approach based on Distance Alignment Search Tool (DAST), which contains an exact algorithm with bounds. Our experiments show that this method successfully finds the most similar proteins in a set without solving all instances.