Title: Building AI Systems for the Era of Experience
Abstract: In this talk, I will cover the challenges of building systems that allow us to train models that learn from their own experiences. After years of progress in Reinforcement Learning from Human Feedback, why do security concerns continue to bottleneck critical deployments of AI? I offer shallow alignment as a potential cause, and examine how the findings from our ICLR 2025 Outstanding Paper have impacted frontier deployments today. I go over the challenges in building systems with robust guardrails, and how we are building state-of-the-art guardrails using Reinforcement Learning from Verifiable Rewards. How do we enable models to learn from their own experiences at scale? A key challenge is keeping training and inference servers closely in sync when training trillion-parameter models. I answer this question through the lens of the RL infrastructure we have built at TogetherAI, which systematically co-designs training algorithms with planet-scale infrastructure.
Bio: Ashwinee is a Staff Research Scientist at Together AI, where he leads the development of large-scale systems for reinforcement learning. He is also a Postdoctoral Fellow at the University of Maryland College Park, working with Tom Goldstein on efficient training methods for large language models. Ashwinee received his PhD in ECE from Princeton University, where he worked with Prateek Mittal on AI security, and his BS and MS in EECS from the University of California, Berkeley, where he worked with Joey Gonzalez on distributed training.
Host: Micah Goldblum