Center for Theoretical Neuroscience Seminar Series w ARNI

Scott Linderman

Location: Kavli Auditorium at Zuckerman Institute (9th Floor)
Date: 1/23/2026
Time: 11:30am
Zoom Link: Upon request @ [email protected]

Title: When and How to Parallelize Seemingly Sequential Models

Abstract: Transformers have become the de facto model for sequential data in large part because they are well adapted to modern hardware: At training time, the loss can be evaluated in parallel over the sequence length on GPUs and TPUs. By contrast, evaluating nonlinear recurrent neural networks (RNNs) appears to be an inherently sequential problem. However, recent advances like DEER (arXiv:2309.12252) and DeepPCR (arXiv:2309.16318) have shown that evaluating a nonlinear recursion can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in wall-clock time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. I will present a recent line of work from my lab that further develops these methods in both theory and practice. We establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in O(log2T) time, where T is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. We validate our claims through extensive experiments, with a particular emphasis on parallelizing nonlinear RNNs and Markov chain Monte Carlo (MCMC) algorithms for Bayesian statistics. I will provide practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.