Papers

List of papers published/submitted to peer reviewed conferences/journals. Please click on the paper to read the abstract and link to full paper. Asterix (*) denotes joint first author, i.e., equal contribution.

  • [1] Manzil Zaheer*, Sashank J Reddi*, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, and Alexander J Smola, “A Generic Approach for Escaping Saddle Points” [Under submission]

    Abstract: A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys good practical performance.

    PDF CODE SLIDES

  • [2] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander J Smola, “Deep Sets” [Accepted as oral presentation at NIPS 2017]

    Abstract: We study the problem of designing models for machine learning tasks defined on sets. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets and are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics [1], to anomaly detection in piezometer data of embankment dams [2], to cosmology [3,4]. Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We also derive the necessary and sufficient conditions for permutation equivariance in deep models. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection.

    PDF CODE SLIDES

  • [3] Manzil Zaheer*, Satwik Kottur*, Amr Ahmed, and Alexander J Smola, “Canopy — Fast Sampling with Cover Trees” [Accepted at ICML 2017]

    Abstract: Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for these models arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets that can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images.

    PDF CODE SLIDES

  • [4] Manzil Zaheer, Amr Ahmed, and Alexander J Smola, “Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data” [Accepted at ICML 2017]

    Abstract: Recurrent neural networks, such as long-short term memory (LSTM), are powerful tools for modeling sequential data like user browsing history (Tan et al., 2016; Korpusik et al., 2016) or natural language text (Mikolov et al., 2010). However, to generalize well across multiple types of users, LSTMs require a large number of parameters, even though the underlying dynamics maybe simple. The increase in complexity and parameters arises due to a large action space in which many of the actions have similar intent or topic. Also, such huge deep networks are uninterpretable, which is highly undesirable in user modeling. In this paper, we introduce Latent LSTM Allocation (LLA) with hierarchical Bayesian models on LSTMs, In LLA, each user is modeled as a sequence of actions, and the model jointly groups actions into topics and learns the temporal dynamics over the topic sequence, instead of action space directly. This leads to a model that is highly interpretable, concise, and can capture intricate dynamics. We present an efficient Stochastic-EM inference algorithm for our model that scales to millions of users/documents. Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines.

    PDF CODE SLIDES

  • [5] Manzil Zaheer*, Rajarshi Das*, Siva Reddy, and Andrew McCallum, “Question Answering with Universal Schema and Memory Networks” [accepted at ACL 2017]

    Abstract: Existing question answering methods infer answers either from a knowledge base or from raw text. While knowledge base (KB) methods are good at answering compositional questions, their performance is often affected by the incompleteness of the KB. Au contraire, web text contains millions of facts that are absent in the KB, however in an unstructured form. Universal schema can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. In this paper we extend universal schema to natural language question answering, employing memory networks to attend to the large body of facts in the combination of text and KB. Our models can be trained in an end-to-end fashion on question-answer pairs. Evaluation results on SPADES fill-in-the-blank question answering dataset show that exploiting universal schema for question answering is better than using either a KB or text alone. This model also outperforms the current state-of-the-art by 8.5 F1 points.

    PDF CODE SLIDES

  • [6] Manzil Zaheer*, Jean-Baptiste Tristan, Michael L. Wick, and Guy L. Steele Jr., “Learning a Static Analyzer: A Case Study on a Toy Language” [rejected at ICLR 2017]

    Abstract: Static analyzers are meta-programs that analyze programs to detect potential errors or collect information. For example, they are used as security tools to detect potential buffer overflows. Also, they are used by compilers to verify that a program is well-formed and collect information to generate better code. In this paper, we address the following question: can a static analyzer be learned from data? More specifically, can we use deep learning to learn a static analyzer without the need for complicated feature engineering? We show that long short-term memory networks are able to learn a basic static analyzer for a simple toy language. However, pre-existing approaches based on feature engineering, hidden Markov models, or basic recurrent neural networks fail on such a simple problem. Finally, we show how to make such a tool usable by employing a language model to help the programmer detect where the reported errors are located..

    PDF CODE SLIDES

  • [7] Manzil Zaheer, Michael Wick, Jean-Baptiste Tristan, Alexander J Smola, and Guy L Steele Jr, “Exponential Stochastic Cellular Automata for Massively Parallel Inference,” In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Statistics, pp. 966-975. 2016.

    Abstract: We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 8-node cluster samples 570 million tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.

    PDF CODE SLIDES

  • [8] Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, and Alexander J Smola, “Spectral Methods for Nonparametric Models” [submitted to JMLR]

    Abstract: Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Bu et Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically ecient. In particular, we derive the lowerorder moments of the IBP and the HDP, propose spectral algorithms for both models, and provide reconstruction guarantees for the algorithms. For the HDP, we further show that applying hierarchical models on dataset with hierarchical structure, which can be solved with the generalized spectral HDP, produces better solutions to that of at models regarding likelihood performance.

    PDF CODE SLIDES

  • [9] Manzil Zaheer, Amr Ahmed, Sujith Ravi, and Alexander J Smola, “Fast Sampling Algorithms for Sparse Latent Variable Models,” [submitted to JMLR]

    Abstract: Latent variable models have become a staple of statistical modelling. Due to their size, inference in these models presents a formidable challenge. In particular, it is difficult to exploit object sparsity, whenever the overall generative model is large and dense. We survey a broad range of MCMC sampling algorithms for topic model inference and investigate their performance both theoretically and experimentally. In particular, we describe the Metropolis-Hastings-Walker method, a novel decomposition based on Fenwick trees and their extension to the Hierarchical Dirichlet Process. Our analysis includes detailed measurements in terms of mixing properties, multithreading, their suitability for modern microprocessors. The associated code is both fast and scalable, and available as open source. A single server beats established distributed solutions.

    PDF CODE SLIDES

  • [10] Manzil Zaheer*, Rajarshi Das*, and Christopher J Dyer, “Gaussian LDA for Topic Models with Word Embeddings,” In Proceedings of the 53th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 795-804. Association for Computational Linguistics, 2015.

    Abstract: Continuous space word embeddings learned from large, unstructured corpora have been shown to be effective at capturing semantic regularities in language. In this paper we replace LDA's parameterization of "topics" as categorical distributions over opaque word types with multivariate Gaussian distributions on the embedding space. This encourages the model to group words that are a-priori known to be semantically related into topics. To perform inference, we introduce a fast collapsed Gibbs sampling algorithm based on Cholesky decompositions of covariance matrices of the posterior predictive distributions. We further derive a scalable algorithm that draws samples from stale posterior predictive distributions and corrects them with a Metropolis--Hastings step. Using vectors learned from a domain-general corpus (English Wikipedia), we report results on two document collections (20-newsgroups and NIPS). Qualitatively, Gaussian LDA infers different (but still very sensible) topics relative to standard LDA. Quantitatively, our technique outperforms existing models at dealing with OOV words in held-out documents.

    PDF CODE SLIDES

  • [11] Manzil Zaheer*, Jay-Yoon Lee*, Stephan Günnemann, and Alexander J Smola, “Preferential Attachments in Graphs with Affinities,” In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 571-580. 2015.

    Abstract: Preferential attachment models for random graphs are successful in capturing many characteristics of real networks such as power law behavior. At the same time they lack flexibility to take vertex to vertex affinities into account, a feature that is commonly used in many link recommendation algorithms. We propose a random graph model based on both node attributes and preferential attachment. This approach overcomes the limitation of existing models on expressing vertex affinity and on reflecting properties of different subgraphs. We analytically prove that our model preserves the power law behavior in the degree distribution as expressed by natural graphs and we show that it satisfies the small world property. Experiments show that our model provides an excellent fit of many natural graph statistics and we provide an algorithm to infer the associated affinity function efficiently.

    PDF CODE SLIDES

  • [12] Fa Wang, Manzil Zaheer, Xin Li, Jean-Olivier Plouchart and Alberto Valdes-Garcia, “Co-Learning Bayesian Model Fusion: Efficient Performance Modeling of Analog and Mixed-Signal Circuits Using Side Information,” In Proceedings of the 2015 IEEE/ACM International Conference on Computer-Aided Design, pp. x-x. IEEE Press, 2015.

    Abstract: Efficient performance modeling of today's analog and mixed-signal (AMS) circuits is an important yet challenging task. In this paper, we propose a novel performance modeling algorithm that is referred to as Co-Learning Bayesian Model Fusion (CL-BMF). The key idea of CL-BMF is to take advantage of the additional information collected from simulation and/or measurement to reduce the performance modeling cost. Different from the traditional performance modeling approaches which focus on the prior information of model coefficients (i.e. the coefficient side information) only, CL-BMF takes advantage of another new form of prior knowledge: the performance side information. In particular, CL-BMF combines the coefficient side information, the performance side information and a small number of training samples through Bayesian inference based on a graphical model. Two circuit examples designed in a commercial 32nm SOI CMOS process demonstrate that CL-BMF achieves up to 5X speed-up over other state-of-the-art performance modeling techniques without surrendering any accuracy.

    PDF CODE SLIDES

  • [13] Manzil Zaheer, Chenjie Gu, Xin Li, “mTunes: efficient post-silicon tuning of mixed-signal/RF integrated circuits based on Markov decision process.” In Proceedings of the 52nd Annual Design Automation Conference, p. 170. ACM, 2015.

    Abstract: Uncertainty prevails in IC manufacturing and circuit operation. In particular, process variability has a huge impact on circuit performance, especially for mixed-signal/RF circuits, leading to unacceptable yields. Additionally, environmental uncertainties, such as temperature fluctuation and channel variation, further deteriorate performances in field. To combat variability, circuits are often made reconfigurable by adding tunable knobs to recover circuit performance in the post-manufacturing stage. However, as the number of knobs increases, knob tuning becomes challenging due to the huge search space. In fact, knob-tuning policies can have an observable impact on final performance and power consumption. In this paper, we propose mTunes, a method based on the Markov decision process for dynamically choosing the “right” knob tuning sub-routine from a pre-defined set achieving a balance between performance and power constraints. The proposed method has been applied to a reconfigurable RF front-end design, showing 60% improvement in yield compared to static tuning policies.

    PDF CODE SLIDES

  • [14] Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J Smola, “Spectral Methods for Hierarchical Dirichlet Process,” NIPS Workshop on Nonparametric Methods for Large Scale Representation Learning, 2015.

    Abstract: The Hierarchical Dirichlet Process (HDP) is a versatile, albeit computationally expensive tool for statistical modeling of mixture models. In this paper, we introduce a spectral algorithm. We show that it is both computationally and statistically efficient. In particular, we derive the lower-order moments of the HDP and give reconstruction guarantees. Moreover, we show that hierarchical spectral method is able to generate a better results regarding likelihood performance.

    PDF CODE SLIDES

  • [15] Manzil Zaheer, Michael Wick, Jean-Baptiste Tristan, Alexander J Smola, Guy L Steele Jr, “Exponential Stochastic Cellular Automata for Massively Parallel Inference,” NIPS Workshop on Machine Learning Systems, 2015.

    Abstract: We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order of magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 8-node cluster samples 503 million tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.

    PDF CODE SLIDES

  • [16] Manzil Zaheer, Michael Wick, Satwik Kottur, Jean-Baptiste Tristan “Comparing Gibbs, EM and SEM for MAP Inference in Mixture Models,” NIPS Workshop on Optimization for Machine Learning, 2015.

    Abstract: Classic inference algorithms such as Gibbs sampling and EM often perform well in practice, but selecting between them when scaling a model to large datasets is difficult. In particular, Gibbs sampling is easy to distribute, but difficult to parallelize, while EM is easy parallelize, but difficult to distribute. Remarkably, the relatively obscure stochastic EM (SEM) combines the computational strengths of the two methods, without inheriting any of their weaknesses. Indeed, we highlight these strengths by deriving the three algorithms on a simple, yet representative (of the general class) mixture model. We also demonstrate empirically that the MAP solutions found by the three algorithms are comparable across a wide variety of parameter settings; further, SEM appears more robust to poor initialization.

    PDF CODE SLIDES

  • [17] Manzil Zaheer, Xin Li and Chenjie Gu, “MPME-DP: Bayesian Model Fusion on Dirichlet Process for Multi-Population Moment Estimation of Analogue/Mixed-Signal Circuits,” In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design, pp. 316-323. IEEE Press, 2014.

    Abstract: Moment estimation is one of the most important tasks to appropriately characterize the performance variability of today’s nanoscale integrated circuits. In this paper, we propose an efficient algorithm of multi-population moment estimation via Dirichlet Process (MPME-DP) for validation of analog and mixed-signal circuits with extremely small sample size. The key idea is to partition all populations (e.g., different environmental conditions, setup configurations, etc.) into groups. The populations within the same group are similar and their common knowledge can be extracted to improve the accuracy of moment estimation. As will be demonstrated by the silicon measurement data of a high-speed I/O link, MPME-DP reduces the moment estimation error by up to 65% compared to other conventional estimators.

    PDF CODE SLIDES

  • [18] Chenjie Gu, Manzil Zaheer and Xin Li, “Multiple-Population Moment Estimation: Exploiting Inter-Population Correlation for Efficient Moment Estimation in Analog/Mixed-Signal Validation,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on 33, no. 7 (2014): 961-974.

    Abstract: Moment estimation is an important problem during circuit validation, in both presilicon and postsilicon stages. From the estimated moments, the probability of failure and parametric yield can be estimated at each circuit configuration and corner, and these metrics are used for design optimization and making product qualification decisions. The problem is especially difficult if only a very small sample size is allowed for measurement or simulation, as is the case for complex analog/mixed-signal circuits. In this paper, we propose an efficient moment estimation method, called multiple-population moment estimation (MPME), that significantly improves estimation accuracy under small sample size. The key idea is to leverage the data collected under different corners/configurations to improve the accuracy of moment estimation at each individual corner/configuration. Mathematically, we employ the hierarchical Bayesian framework to exploit the underlying correlation in the data. We apply the proposed method to several datasets including postsilicon measurements of a commercial high-speed I/O link, and demonstrate an average error reduction of up to 2×, which can be equivalently translated to significant reduction of validation time and cost.

    PDF CODE SLIDES

  • [19] Manzil Zaheer and Anup K Gogoi, “Mellin-Fourier Transform based Communication Scheme for Wideband Linear Time Varying Channels,” Advanced Communication Technology (ICACT), 2013 15th International Conference on, 27-30 January 2013 [Accepted, but not presented]

    Abstract: This paper proposes a novel communication scheme designed for wideband linear time varying channels. It is based upon a combination of Mellin and Fourier transforms. The proposed scheme is able to equalize simultaneous Doppler and delay spread introduced by wideband linear time varying channel. The proposed method eliminates need of large memory requirements for buffering or intensive computational requirements of contemporary systems. Analytical development of the proposed system is presented along with computer simulations demonstrating the effectiveness of the same.

    PDF CODE SLIDES

  • [20] Andreas Reinhardt, Dominic Burkhardt, Manzil Zaheer and Ralf Steinmetz, “Electric Appliance Classification Based on Distributed High Resolution Current Sensing,” Local Computer Networks (LCN), 2012 IEEE 37th Conference on, pp. 1003-1009, 22-25 October 2012.

    Abstract: Today's solutions to inform residents about their electricity consumption are mostly confined to displaying aggregate readings collected at meter level. A reliable identification of appliances that require disproportionate amounts of energy for their operation is generally unsupported by these systems, or at least requires significant manual configuration efforts. We address this challenge by placing low-cost measurement and actuation units into the mains connection of appliances. The distributed sensors capture the current flow of individual appliances at a sampling rate of 1.6kHz and apply local signal processing to the readings in order to extract characteristic fingerprints. These fingerprints are communicated wirelessly to the evaluation server, thus keeping the required airtime and energy demand of the transmission low. The evaluation server employs machine learning techniques and caters for the actual classification of attached electric appliances based on their fingerprints, enabling the correlation of consumption data and the appliance identity. Our evaluation is based on more than 3,000 current consumption fingerprints, which we have captured for a range of household appliances. The results indicate that a high accuracy is achieved when locally extracted current consumption fingerprints are used to classify appliances.

    PDF CODE SLIDES

  • [21] Manzil Zaheer, Jaspreet Singh Suri and Harshal B Nemade, “Primary Side Control based Inductively Coupled Powering Scheme for Biomedical Implants,” EMBS Biomedical Health Informatics, 2012 IEEE International Conference on, pp. 174-179, 2-7 January 2012.

    Abstract: This paper proposes a novel and easy technique to implement primary control for inductively coupled power transfer systems. With the proposed design, the control law boils down simply to a multiplication of voltages across two capacitors in the primary side. The advantages include easy implementation and the topology is favourable for biomedical application, as any chance of heating due to presence of power flow controller in secondary side is eliminated. The effectiveness of the proposed power pickup and its applicability to general wireless power transfer applications has been demonstrated by both simulation and experimental results.

    PDF CODE SLIDES

  • [22] Andreas Reinhardt, Dominic Burkhardt, Parag S Mogre, Manzil Zaheer and Ralf Steinmetz, “SmartMeter.KOM: A low-cost wireless sensor for distributed power metering,” Local Computer Networks (LCN), 2011 IEEE 36th Conference on, pp. 1032-1039, 4-7 October 2011

    Abstract: Most current smart metering solutions aim at increasing user awareness for their household’s electrical energy consumption. Although some smart meters make use of wireless data transfers between their sensor and display units, their integration into existing wireless sensor networks is hampered by proprietary communication interfaces and their lack of reprogrammability. Furthermore, the sole availability of aggregate consumption values renders current meters insufficient for novel application scenarios like smart home automation, for which information at device-level granularity and high resolution is vital. We address these shortcomings of existing solutions by presenting SmartMeter.KOM, our wireless sensor node capable of determining the current consumption of individual electrical appliances at high resolution. The platform is based on low-power hardware and incorporates a reprogrammable microcontroller which allows developers to easily deploy new algorithms. Its IEEE 802.15.4-compliant radio transceiver makes its integration with existing sensor networks possible, and thus enables their integration in smart buildings. We demonstrate the versatility of SmartMeter.KOM by presenting prototypical implementations of smart applications and identifying further research directions.

    PDF CODE SLIDES

  • [23] Manzil Zaheer, Nitish Patel and Aiguo Patrick Hu, “Parallel tuned contactless power pickup using saturable core reactor,” Sustainable Energy Technologies (ICSET), 2010 IEEE International Conference on, pp. 1-6, 6-9 December 2010

    Abstract: This paper proposes a novel contactless secondary power pickup designed for inductively coupled contactless power transfer systems. It is based upon parallel LC tuning with the equivalent capacitance varied by a saturable core reactor (SCR). The proposed technique allows the power pickup to achieve full-range operation for power flow regulation and maintain constant output voltage at a high quality factor Q. The method eliminates the tedious fine-tuning process associated with traditional fixed power pickup tuning methods and eases the component selection. Moreover, it can overcome the online circuit parameter variations and automatically achieve the maximum power transfer capacity when required. In order to meet dynamic load demands, the SCR is controlled to be a variable inductor. A simple algorithm is developed to change the tuning condition of the power pickup. The effectiveness of the proposed power pickup and its applicability to general wireless power transfer applications has been demonstrated by both simulation and experimental results.

    PDF CODE SLIDES

Project Reports

Here are some of the course project reports I could get hold of.

  • [1] Manzil Zaheer, Carlos Ramirez, Soumya Batra, Mladen Kolar, “Large Scale Structure Learning of Conditional Gaussian Graphical Models,” Probabilistic Graphical Models Course taught by Prof Eric Xing.

    PDF CODE SLIDES

  • [2] Manzil Zaheer, Abhinav Mourya, “Mining Interesting Local Structure from Tensors,” Data mining and multimedia database Course taught by Prof Christos Falutsos.

    PDF CODE SLIDES

  • [3] Sahil Single, Manzil Zaheer, “Local List-Decoding of Reed-Muller Codes over Binary Field,” Coding Theory Course taught by Prof Venkat Guruswami.

    PDF CODE SLIDES

  • [4] Manzil Zaheer, Jay-Yoon Lee, “Gaussian-Stick Breaking Process,” Statistical Machine Learning Course taught by Prof Larry Wasserman.

    PDF CODE SLIDES

  • [5] Manzil Zaheer,“Does Information always represent quantity meant?”

    PDF CODE SLIDES

Homework

Here I will put some of the course homework with nonconventional solutions I came up with.