Ge Liu (刘謌)

Incoming Assistant Professor in Computer Science, UIUC
Ph.D. in Computer Science, MIT
Email:geliu@illinois.edu, geliu@csail.mit.edu
[Google scholar] [Github] [CV]

I will be joining the Department of Computer Science of the University of Illinois at Urbana-Champaign (UIUC) as an Assistant Professor in 2024. I am looking for PhD students starting in 2024 Fall (application due Dec 2023). I received my PhD from MIT EECS department, advised by professor David Gifford from Computer Science and Artificial Intelligence Laboratory (CSAIL). My research develops uncertainty-aware, reliable, efficient, and interpretable machine learning and optimization techniques, as well as novel experiment frameworks and computational tools, for solving important problems in synthetic biology, immunology, and molecular biology, that go beyond just predictive modeling. I am especially interested in computational molecule design for therapeutic and prophylactic medicines, including but not limited to antibody design and vaccine design. Machine learning wise I'm working on deep generative models, deep sequential models, optimization, active learning, model uncertainty for deep neural networks, and reinforcement learning. I am also interested in applications to real-world recommender systems, personalization, and time series problems. My PhD thesis won the MIT EECS George M. Sprowls Ph.D. Thesis Award in AI and Decision-Making in 2021.

I recieved my bachelor degree from Tsinghua University EE department, where I worked as a research assistant in Machine Learning and Computational Biology Group (IIIS), advised by Prof. Jianyang Zeng. I was a visiting scholar at CMU in 2014 summer and worked in Murphy Lab, Lane Center for Computational Biology, advised by professor Robert F. Murphy.

For Prospective Students and Interns: I am actively looking for highly-motivated Ph.D. students and interns to work on problems in the exciting frontiers of ML and AI4Science. If you are interested in joining my group, please send me an email at geliu[at]illinois[dot]edu with your CV and research summary. Students with strong motivation to do good science and make an impact in biomedicine while innovating in ML/CS are highly welcomed. Some of my research topics include but not limited to:
  • Deep generative model for biological molecule design (antibodies, peptide vaccines, proteins) and drug discovery.
  • Iterative experiment design with uncertainty-aware learning, active learning, Bayesian optimization, and online learning algorithms (e.g. bandits, RL with feedbacks).
  • Optimization: efficient algorithms for solving combinatorial optimization, discrete optimization, and black-box optimization problems in real-world biomedicine design problems.

News

  • [2023/5]I will be joining UIUC CS department as an assistant professor in 2024
  • [2023/4]Delighted to give an invited talk at UT Austin CS
  • [2023/3]Our Covid-19 vaccine design work is covered by MIT News and Boston Globe!
  • [2023/3]Delighted to give an invited talk at U Chicago CS Seminar
  • [2023/3]Delighted to give an invited talk at NYU Courant
  • [2023/2]Our paper on pan-variant Covid19 vaccine design with animal study is accepted to Frontiers in Immunology
  • [2023/2]Delighted to give an invited talk at UIUC CS Seminar
  • [2022/7]Our paper on antibody computational counter-panning is accepted by Cell Reports Methods
  • [2022/1]2 papers accepted to ICLR 2022

Work Experience

  • AWS AI Labs, Senior Applied Scientist                    Sep 2020 - now
    I am working with AWS science team on large scale deep sequential modeling, personalized recommender system,and modeling for time series.
  • Google Brain, Research Intern                    May 2019 - Aug 2019
    I was working with Mixel (SIR) team at Google Brain on inventing novel data-efficient training approach for Reinforcement Learning with Adaptive Behavior Policy Sharing.
  • Google Brain, Research SWE Intern                June 2018 - Aug 2018
    I was working with medical audio scribe modeling team at Google Brain on active learning for natural language processing tasks in health-care domain.

Honors and Awards

  • MIT EECS George M. Sprowls Ph.D. Thesis Award in AI and Decision-Making (awarded to 2 PhD students in EECS),  2021
  • The Data Open Datathon Global Championship, Finalist,  2019
  • The Data Open Datathon Boston-regional, top 3,  2018
  • David S. Y. Wong Fellowship at MIT,  2016
  • Friendship of Tsinghua-Sumsung Scholarship,  2013
  • Outstanding Social Work Scholarship,  2012
  • 2nd prize in Chinese Physics Olympiad for High school, Beijing,  2011
  • Massachusetts Math Olympiad Level One, Finalist, top 25 of Massachusetts State,  2008

Selected Publications

  1. A pan-variant mRNA-LNP T cell vaccine protects HLA transgenic mice from mortality after infection with SARS-CoV-2 Beta [PDF]
    Brandon Carter, Pinghan Huang, Ge Liu, Yuejin Liang, Paulo JC Lin, Bi-Hung Peng, Lindsay McKay, Alexander Dimitrakakis, Jason Hsu, Vivian Tat, Panatda Saenkham-Huntsinger, Jinjin Chen, Clarety Kaseke,Gaurav D Gaiha, Qiaobing Xu, Anthony Griffiths, Ying K Tam, Chien-Te K Tseng, David K Gifford.
    Frontiers in Immunology, 2023.
  2. Maximum n-times Coverage for Vaccine Design. [PDF]
    Ge Liu,Alexander Dimitrakakis, Brandon Carter, and David K. Gifford
    Proceedings of the 10th International Conference on Learning Representations (ICLR 2022).
  3. Bridging Recommendation and Marketing via Recurrent Intensity Modeling [PDF]
    Yifei Ma, Ge Liu,Anoop Deoras.
    Proceedings of the 10th International Conference on Learning Representations (ICLR 2022).
  4. Sequence-graph duality: Unifying user modeling with self-attention for sequential recommendation [PDF]
    Zeren Shui, Ge Liu, Anoop Deoras, George Karypis
    New Frontiers in Graph Learning Workshop, NeurIPS 2022.
  5. Computational counterselection identifies nonspecific therapeutic biologic candidates [PDF]
    Sachit Dinesh Saksena, Ge Liu, Christine Banholzer, Geraldine Horny, Stefan Ewert, David K. Gifford
    Cell reports methods, 2022
  6. Predicted Cellular Immunity Population Coverage Gaps for SARS-CoV-2 Subunit Vaccines and their Augmentation by Compact Peptide Sets. [PDF]
    Ge Liu, Brandon Carter, and David K. Gifford
    Cell systems, 2021
  7. Computationally Optimized SARS-CoV-2 MHC Class I and II Vaccine Formulations Predicted to Target Human Haplotype Distributions. [PDF]
    Ge Liu, Brandon Carter, Trenton Bricken, Siddhartha Jain, Mathias Viard, Mary Carrington, and David K. Gifford
    Cell systems, 2020
  8. Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles. [PDF]
    Ge Liu*, Siddhartha Jain*, Jonas Mueller, and David K. Gifford,
    34th AAAI Conference on Artificial Intelligence (AAAI 2020).
  9. Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing (work done during intership at Google Brain, in collaboration with DeepMind). [PDF]
    Ge Liu, Heng-tze Cheng, Rui Wu, Jing Wang, Jayden Ooi, Sibon Li, Ang Li, Lihong Li, Craig Boutilier, Ed Chi,
    Deep Reinforcement Learning Workshop, NeurIPS 2019
    Optimization Foundations of RL workshop, NeurIPS 2019.
  10. Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning. [PDF]
    Ge Liu*, Haoyang Zeng*, Jonas Mueller, Brandon Carter, Ziheng Wang, Jonas Schilz, Geraldine Horny, Michael E Birnbaum, Stefan Ewert, David K. Gifford,
    Bioinformatics, 2019
  11. Information Condensation Active Learning. [preprint]
    Siddhartha Jain, Ge Liu, David K. Gifford,
    under review .
  12. Visualizing Complex Feature Interactions and Feature Fharing in Genomic Deep Neural Networks. [PDF]
    Ge Liu, Haoyang Zeng, David K. Gifford,
    BMC bioinformatics 20 (1), 1-14, 2019
  13. Visualizing Feature Maps in Deep Neural Networks using DeepResolve-A Genomics Case Study. [PDF]
    Ge Liu, David K. Gifford,
    Workshop on Visualization for Deep Learning, ICML 2017
  14. Convolutional Neural Network Architectures for Predicting DNA–protein Binding. [PDF]
    Haoyang Zeng, Matthew D. Edwards, Ge Liu, David K. Gifford,
    Bioinformatics, 32 (12), i121-i127, 2016

Research Highlights

[Full image]

Maximum n-times Coverage for Vaccine Design

We introduce the maximum n-times coverage problem that selects k overlays to maximize the summed coverage of weighted elements, where each element must be covered at least n times. We also define the min-cost n-times coverage problem where the objective is to select the minimum set of overlays such that the sum of the weights of elements that are covered at least n times is at least \( \tau \). Maximum n-times coverage is a generalization of the multi-set multi-cover problem, is NP-complete, and is not sub-modular. We introduce two new practical solutions for n-times coverage based on integer linear programming and sequential greedy optimization. We show that maximum n-times coverage is a natural way to frame peptide vaccine design, and find that it produces a pan-strain COVID-19 vaccine design that is superior to 29 other published designs in predicted population coverage and the expected number of peptides displayed by each individual's HLA molecules. [PDF]

[Full image]

Computationally optimized Covid19 peptide vaccine design

We present a combinatorial machine learning method to evaluate and optimize peptide vaccine formulations for SARS-CoV-2. Our approach optimizes the presentation likelihood of a diverse set of vaccine peptides conditioned on a target human-population HLA haplotype distribution and expected epitope drift. Our proposed SARS-CoV-2 MHC class I vaccine formulations provide 93.21% predicted population coverage with at least five vaccine peptide-HLA average hits per person (≥ 1 peptide: 99.91%) with all vaccine peptides perfectly conserved across 4,690 geographically sampled SARS-CoV-2 genomes. Our proposed MHC class II vaccine formulations provide 97.21% predicted coverage with at least five vaccine peptide-HLA average hits per person with all peptides having an observed mutation probability of ≤ 0.001. We provide an open-source implementation of our design methods (OptiVax), vaccine evaluation tool (EvalVax), as well as the data used in our design efforts here: https://github.com/gifford-lab/optivax [PDF]

Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles

The inaccuracy of neural network models on inputs that do not stem from the training data distribution is both problematic and at times unrecognized. Model uncertainty estimation can address this issue, where uncertainty estimates are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), a straightforward approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs that might be encountered in the future. When applied to various neural network ensembles, MOD significantly improves predictive performance for out-of-distribution test examples without sacrificing in-distribution performance on 38 Protein-DNA binding regression datasets, 9 UCI datasets, and the IMDB-Wiki image dataset. Across many Bayesian optimization tasks, the performance of UCB acquisition is also greatly improved by leveraging MOD uncertainty estimates.
[PDF]

[Full image]

Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning.

The precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties.Here, we present Ens-Grad, a machine learning method that can design complementarity determining regions of human Immunoglobulin G antibodies with target affinities that are superior to candidates derived from phage display panning experiments. We also demonstrate that machine learning can improve target specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data.
[PDF]

[Full image]

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Deep Reinforcement Learning (RL) is proven powerful in simulated environments. However, training deep RL model is challenging in real world applications(e.g. production-scale health-care or recommender systems) because of the limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with Population Based Training (PBT). We experiment with multiple Atari games and ABPS achieves superior overall performance, reduced variance on top agents, and better performance on the best agent compared to conventional hyper-parameter tuning, even though ABPS requires much fewer number of environmental interactions.[PDF]

[Full image]

Visualizing Complex Feature Interactions and Feature Fharing in Genomic Deep Neural Networks

Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network’s response to specific input examples that may be insufficient to reveal the complexity of model decision making. We present DeepResolve, an analysis framework for deep convolutional models of genome function that visualizes how input features contribute individually and combinatorially to network decisions. Unlike other methods, DeepResolve does not depend upon the analysis of a predefined set of inputs. Rather, it uses gradient ascent to stochastically explore intermediate feature maps to 1) discover important features, 2) visualize their contribution and interaction patterns, and 3) analyze feature sharing across tasks that suggests shared biological mechanism. DeepResolve is competitive with existing visualization tools in discovering key sequence features, and identifies certain negative features and non-additive feature interactions that are not easily observed with existing tools. It also recovers similarities between poorly correlated classes which are not observed by traditional methods. DeepResolve reveals that DeepSEA’s learned decision structure is shared across genome annotations including histone marks, DNase hypersensitivity, and transcription factor binding. We identify groups of TFs that suggest known shared biological mechanism, and recover correlation between DNA hypersensitivities and TF/Chromatin marks.
[PDF]

Misc.

  • I am a Rock Band vocalist of JAM-SOUL, a student band at MIT