I am a founding member at Periodic Labs. I am also an Adjunct Professor at McGill University. Briefly, I worked on reinforcement learning and reasoning at Meta. Before that, I was a staff research scientist in the Google DeepMind Team. I finished my PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, I spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, I graduated in Computer Science and Engineering from IIT Bombay.
My current research revolves around RL and LLMs, and my prior work has received an outstanding paper award at NeurIPS. I was also an core contributor for Gemma and Gemini models.
Current PhD Students
- Morgane Moss (Co-supervised with Aaron Courville)
Past Interns & Student Researchers
- Max Schwarzer (BBF, Now ChatGPT lead @ OpenAI)
- Devvrit Khatri (Scaling RL Compute, PhD @ UT Austin)
- Yongchao Zhou (DistillSpec, Now @ x.AI)
- Arian Hosseini (V-STaR, Now @ GDM)
- Jesse Farebrother (Stop Regressing, PhD @ McGill)
- Lunjun Zhang (Generative RMs, PhD @ UofT)
- Charline Le Lan (RL Generalization, Now Gemini Flash @ GDM)
- Michael Noukhovitch ( Asynchronous RL for LLMs, PhD @ Mila)
- Wenda Xu (Speculative KD , Now @ Google)
- Hritik Bansal (Compute-Optimal STaR / KD / W2S , PhD @ UCLA)
- Josh P Zitovsky (Offline Model Selection, Now @ Amazon)
- Amrith Setlur (Advantage for PRMs , PhD @ UC Berkeley)
- Ghada Sokar (Dormant Neurons, Now @ GDM)
- Siddhant Agarwal (Undergrad Researcher, Now PhD @ UT Austin )
News
- Thinking Machines blog post based on my 2023 paper about On-policy Distillation of LLMs.
- Delta Podcast on research career and lessons learned so far.
- Talk at CVPR 2025 on The Bitter Lesson for RL: Verification as the Key to Reasoning LLMs . [Youtube]
- Tutorial on Post-Training Distillation of LLMs at Google. [Podcast @ Youtube]
- 7 papers accepted at ICLR 2025, including Generative Verifiers , SCoRE , Speculative KD, Async RLHF, and Inference-aware RL for LLMs .
- Panelist on the Inference Time LLM Algorithms Tutorial at NeurIPS 2024.
- Gave a guest lecture at McGill about RL, Reasoning, and Verifiers.