- This event has passed.
Lecture in AI: Danqi Chen
December 6, 2024 @ 11:00 am - 12:00 pm
Title: Training Language Models in Academic: Research Questions and Opportunities
Abstract: Large language models have emerged as transformative tools in artificial intelligence, demonstrating unprecedented capabilities in understanding and generating human language. While these models have achieved remarkable performance across a wide range of benchmarks and enabled groundbreaking applications, their development has been predominantly concentrated within large technology companies due to substantial computational and proprietary data requirements. In this talk, I will present a vision for how academic research can play a critical role in advancing the open language model ecosystem, particularly by developing smaller yet highly capable models and advancing our fundamental understanding of training practices. Drawing from our research group’s recent projects, I will examine key research questions and challenges in both pre-training and post-training stages. Our work spans developing small language models (Sheared LLaMA; 1-3B parameters), the state-of-the-art <10B model on Chatbot Arena (gemma-2-SimPO), and long-context models supporting up to 512K tokens (ProLong). These examples illustrate how academic research can push the boundaries of model efficiency, capability, and scalability. I will conclude by exploring future directions and highlighting opportunities to shape the development of more accessible and powerful language models.