Tal Wagner
Academic Homepage
I am a senior applied scientist in Amazon AWS. Before that I was a postdoc in the Machine Learning Foundations group at Microsoft Research Redmond (2020-2022), and obtained my PhD in Computer Science at CSAIL, MIT in 2020, advised by Piotr Indyk.
My research interests are in designing algorithms for massive datasets and large-scale machine learning, especially in the following contexts:
-
High-dimensional metric data and nearest neighbor search
Efficient linear algebra
Learning on data streams
Data-driven and learning-augmented algorithm design
Email: firstname (dot) lastname (at) gmail (dot) com
Updates
[ recent | all ]
(Apr'23) | Our work on fast differentially private KDE is accepted to ICML'23 as an oral presentation. | |
(Sep'22) | Our work on exponentially improved WL simulation with GNNs is accepted to NeurIPS'22. | |
(Aug'22) | I joined Amazon AWS as a Senior Applied Scientist. | |
(Jul'22) | I spoke at the workshop on Quantitative Geometry of Transportation Metrics in the AMS-EMS-SMF 2022 meeting. | |
(May'22) | Our work on support-aware histograms is accepted to ICML'22. |
(May'22) | Our work on generalization bounds for numerical linear algebra is accepted to COLT'22. | |
(May'22) | Attending the Workshop on Algorithms with Predictions (ALPS 2022) in EPFL. | |
(Jan'22) | Our work on learning-augmented motif counting on graph streams is accepted to ICLR'22. | |
(Nov'21) | Our work on optimal metric compression is accepted to the SIAM Journal on Computing. | |
(Nov'21) | Honored to have been selected for the Rising Stars in Data Science workshop at the University of Chicago. | |
(Sep'21) | Our work on data-driven algorithms for low rank approximation is accepted to NeurIPS'21. | |
(May'21) | Our work on faster kernel matrix algebra is accepted to ICML'21. | |
(Jan'21) | Our work on learning-based support size estimation is accepted to ICLR'21 as a spotlight. | |
(Sep'20) | I joined the Machine Learning Foundations group in MSR Redmond as a postdoc. | |
(Aug'20) | I completed my PhD in MIT. | |
(Jul'20) | I spoke at the "Are we ready for semi-supervised learning for big data? From theory to practice" minisymposium at SIAM Imaging Science 2020. | |
(Jul'20) | We presented our work on scalable nearest neighbor search for optimal transport in ICML'20. Here is the video. |
Publications
[ selected | all ]
Fast Private Kernel Density Estimation via Locality Sensitive Quantization
with Yonatan Naamad and Nina Mishra
in ICML 2023, Oral presentation
[PDF] [poster (PDF)] [video (5min)] [code]
with Yonatan Naamad and Nina Mishra
in ICML 2023, Oral presentation
[PDF] [poster (PDF)] [video (5min)] [code]
Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport
with David Alvarez-Melis, Nicolo Fusi and Lester Mackey
Manuscript 2023
[arXiv]
with David Alvarez-Melis, Nicolo Fusi and Lester Mackey
Manuscript 2023
[arXiv]
Learned Interpolation for Better Streaming Quantile Approximation with Worst Case Guarantees
with Nicholas Schiefer, Anders Aamand, Justin Chen, Piotr Indyk, Shyam Narayanan and Sandeep Silwal
in ACDA 2023
[arXiv]
with Nicholas Schiefer, Anders Aamand, Justin Chen, Piotr Indyk, Shyam Narayanan and Sandeep Silwal
in ACDA 2023
[arXiv]
Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks
with Anders Aamand, Justin Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Nicholas Schiefer and Sandeep Silwal
in NeurIPS 2022
[arXiv]
with Anders Aamand, Justin Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Nicholas Schiefer and Sandeep Silwal
in NeurIPS 2022
[arXiv]
Unveiling Transformers with LEGO: A Synthetic Reasoning Task
with Yi Zhang, Arturs Backurs, Sebastien Bubeck, Ronen Eldan and Suriya Gunasekar
Manuscript 2022
[arXiv]
with Yi Zhang, Arturs Backurs, Sebastien Bubeck, Ronen Eldan and Suriya Gunasekar
Manuscript 2022
[arXiv]
Streaming Algorithms for Support-Aware Histograms
with Justin Chen and Piotr Indyk
in ICML 2022
[arXiv] [proceedings] [video by Justin (5min)]
with Justin Chen and Piotr Indyk
in ICML 2022
[arXiv] [proceedings] [video by Justin (5min)]
Generalization Bounds for Data-Driven Numerical Linear Algebra
with Peter Bartlett and Piotr Indyk
in COLT 2022
[arXiv] [slides (PDF)] [video (20min)]
with Peter Bartlett and Piotr Indyk
in COLT 2022
[arXiv] [slides (PDF)] [video (20min)]
Triangle and Four Cycle Counting with Predictions in Graph Streams
with Justin Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, David Woodruff and Michael Zhang
in ICLR 2022
[arXiv]
with Justin Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, David Woodruff and Michael Zhang
in ICLR 2022
[arXiv]
Optimal (Euclidean) Metric Compression
with Piotr Indyk
in SIAM Journal on Computing (SICOMP), 2022
[arXiv] [doi] [video (30min)]
with Piotr Indyk
in SIAM Journal on Computing (SICOMP), 2022
[arXiv] [doi] [video (30min)]
Few-Shot Data-Driven Algorithms for Low Rank Approximation
with Piotr Indyk and David Woodruff
in NeurIPS 2021
[PDF] [proceedings]
with Piotr Indyk and David Woodruff
in NeurIPS 2021
[PDF] [proceedings]
Faster Kernel Matrix Algebra via Density Estimation
with Arturs Backurs, Piotr Indyk and Cameron Musco
in ICML 2021
[arXiv] [video (5min)]
with Arturs Backurs, Piotr Indyk and Cameron Musco
in ICML 2021
[arXiv] [video (5min)]
Learning-based Support Estimation in Sublinear Time
with Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld and Sandeep Silwal
in ICLR 2021, Spotlight presentation
[arXiv]
with Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld and Sandeep Silwal
in ICLR 2021, Spotlight presentation
[arXiv]
Scalable Nearest Neighbor Search for Optimal Transport
with Arturs Backurs, Yihe Dong, Piotr Indyk and Ilya Razenshteyn
in ICML 2020
Earlier version in OTML Workshop at NeurIPS 2019
[arXiv] [slides (PDF)] [video (12min)] [code]
with Arturs Backurs, Yihe Dong, Piotr Indyk and Ilya Razenshteyn
in ICML 2020
Earlier version in OTML Workshop at NeurIPS 2019
[arXiv] [slides (PDF)] [video (12min)] [code]
Learning Space Partitions for Nearest Neighbor Search
with Yihe Dong, Piotr Indyk and Ilya Razenshteyn
in ICLR 2020
[arXiv] [slides (PDF)] [video (5min)] [code]
with Yihe Dong, Piotr Indyk and Ilya Razenshteyn
in ICLR 2020
[arXiv] [slides (PDF)] [video (5min)] [code]
Eccentricity Heuristics through Sublinear Analysis Lenses
in APoCS 2020
Earlier version in SPAA 2019 (brief announcement)
[PDF] [brief announcement (PDF)] [slides (PDF)]
in APoCS 2020
Earlier version in SPAA 2019 (brief announcement)
[PDF] [brief announcement (PDF)] [slides (PDF)]
Multitasking Capacity: Hardness Results and Improved Constructions
with Noga Alon, Jonathan D. Cohen, Tom Griffiths, Pasin Manurangsi, Daniel Reichman, Igor Shinkar and Alexander Yu
in SIAM Journal on Discrete Mathematics (SIDMA), 2020
[arXiv] [doi]
with Noga Alon, Jonathan D. Cohen, Tom Griffiths, Pasin Manurangsi, Daniel Reichman, Igor Shinkar and Alexander Yu
in SIAM Journal on Discrete Mathematics (SIDMA), 2020
[arXiv] [doi]
Space and Time Efficient Kernel Density Estimation in High Dimensions
with Arturs Backurs and Piotr Indyk
in NeurIPS 2019
[PDF] [slides (PDF)] [poster (PDF)] [code]
with Arturs Backurs and Piotr Indyk
in NeurIPS 2019
[PDF] [slides (PDF)] [poster (PDF)] [code]
Scalable Fair Clustering
with Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber and Ali Vakilian
in ICML 2019
[arXiv] [code]
with Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber and Ali Vakilian
in ICML 2019
[arXiv] [code]
Sample-Optimal Low-Rank Approximation of Distance Matrices
with Piotr Indyk, Ali Vakilian and David Woodruff
in COLT 2019
[arXiv] [slides (PDF)] [poster (PDF)] [video (10min)]
with Piotr Indyk, Ali Vakilian and David Woodruff
in COLT 2019
[arXiv] [slides (PDF)] [poster (PDF)] [video (10min)]
Semi-Supervised Learning on Data Streams via Temporal Label Propagation
with Sudipto Guha, Shiva Kasiviswanathan and Nina Mishra
in ICML 2018
[PDF] [proceedings] [slides (PDF)] [poster (PDF)] [video (10min)]
with Sudipto Guha, Shiva Kasiviswanathan and Nina Mishra
in ICML 2018
[PDF] [proceedings] [slides (PDF)] [poster (PDF)] [video (10min)]
Approximate Nearest Neighbors in Limited Space
with Piotr Indyk
in COLT 2018
[arXiv] [slides (PDF)] [poster (PDF)] [video (10min)]
with Piotr Indyk
in COLT 2018
[arXiv] [slides (PDF)] [poster (PDF)] [video (10min)]
Practical Data-Dependent Metric Compression with Provable Guarantees
with Piotr Indyk and Ilya Razenshteyn
in NeurIPS 2017
[arXiv] [code] [poster (PDF)]
with Piotr Indyk and Ilya Razenshteyn
in NeurIPS 2017
[arXiv] [code] [poster (PDF)]
A Graph-Theoretic Approach to Multitasking
with Noga Alon, Jonathan D. Cohen, Biswadip Dey, Tom Griffiths, Sebastian Musslick, Kayhan Ozcimder, Daniel Reichman and Igor Shinkar
in NeurIPS 2017, Oral presentation
[arXiv] [poster (PDF)]
with Noga Alon, Jonathan D. Cohen, Biswadip Dey, Tom Griffiths, Sebastian Musslick, Kayhan Ozcimder, Daniel Reichman and Igor Shinkar
in NeurIPS 2017, Oral presentation
[arXiv] [poster (PDF)]
A Sampling-based Approach to Accelerating Queries in Log Management Systems
with Eric Schkufza and Udi Wieder
in SPLASH 2016
[doi]
with Eric Schkufza and Udi Wieder
in SPLASH 2016
[doi]
Towards Resistance Sparsifiers
with Michael Dinitz and Robert Krauthgamer
in RANDOM 2015
[arXiv] [doi] [slides (PDF)]
with Michael Dinitz and Robert Krauthgamer
in RANDOM 2015
[arXiv] [doi] [slides (PDF)]
Cheeger-Type Approximation for Sparsest st-Cut
with Robert Krauthgamer
in ACM Transactions on Algorithms (TALG), 2016
[arXiv] [doi]
with Robert Krauthgamer
in ACM Transactions on Algorithms (TALG), 2016
[arXiv] [doi]
Volume Regularization for Binary Classification
Koby Crammer and Tal Wagner
in NeurIPS 2012, Spotlight presentation
[PDF]
Koby Crammer and Tal Wagner
in NeurIPS 2012, Spotlight presentation
[PDF]
Ph.D. Thesis: | Efficient Metric Representations for Big Data |
MIT, September 2020 |