Internal Working Group Speakers
Frontier Models for Neuroscience and Behavior

Date: May 11, 2026
Time: 3:00pm
Virtual Link: Upon request at [email protected]
Title: A foundation model of vision, audition, and language for in-silico neuroscience
Abstract: Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain. We will be hosting Hubert Banville from Meta who will discuss their latest TRIBE fMRI foundation model.

Date: March 2, 2026
Time: 3:00pm
Virtual Link: Upon request at [email protected]
Abstract: Spatiotemporal and multimodal datasets contain structured variability distributed across space, time, and measurement modality, motivating modeling approaches that can learn representations directly from large-scale data. Inspired by video foundational models, we study how the masked autoencoder training objective can learn shared structure across heterogeneous observations while preserving modality-specific information, and how training these models requires multiple engineering methods for scaling. Furthermore, we show that self-attention supports the emergence of interpretable structure by decomposing them based on the variability across samples. These results suggest that large-scale self-supervised learning provides a unified approach for modeling high-dimensional dynamical systems while enabling interpretation of the learned representations.

Date: February 2, 2026
Time: 3:00pm
Virtual Link: request @ [email protected]
Title: A multimodal sleep foundation model for disease prediction 2)
Abstract:
Sleep is a fundamental biological process with broad implications for physical and mental health, yet its complex relationship with disease remains poorly understood. Polysomnography (PSG)—the gold standard for sleep analysis—captures rich physiological signals but is underutilized due to challenges in standardization, generalizability and multimodal integration. To address these challenges, we developed SleepFM, a multimodal sleep foundation model trained with a new contrastive learning approach that accommodates multiple PSG configurations. Trained on a curated dataset of over 585,000 hours of PSG recordings from approximately 65,000 participants across several cohorts, SleepFM produces latent sleep representations that capture the physiological and temporal structure of sleep and enable accurate prediction of future disease risk. From one night of sleep, SleepFM accurately predicts 130 conditions with a C-Index of at least 0.75 (Bonferroni-corrected P < 0.01), including all-cause mortality (C-Index, 0.84), dementia (0.85), myocardial infarction (0.81), heart failure (0.80), chronic kidney disease (0.79), stroke (0.78) and atrial fibrillation (0.78). Moreover, the model demonstrates strong transfer learning performance on a dataset from the Sleep Heart Health Study—a dataset that was excluded from pretraining—and performs competitively with specialized sleep-staging models such as U-Sleep and YASA on common sleep analysis tasks, achieving mean F1 scores of 0.70–0.78 for sleep staging and accuracies of 0.69 and 0.87 for classifying sleep apnea severity and presence. This work shows that foundation models can learn the language of sleep from multimodal sleep recordings, enabling scalable, label-efficient analysis and disease prediction.

Date: November 5, 2025
Bio
Bryan Li is completing his PhD in NeuroAI at the University of Edinburgh, under the supervision of Arno Onken and Nathalie Rochefort. His main PhD project focuses on building deep learning-based encoding models of the visual cortex that accurately predict neural activity in response to arbitrary visual stimuli. Recently, he joined Dario Farina’s lab at Imperial College London as an Encode Fellow, working on neuromotor interfacing and decoding.
Title:
Movie-trained transformer reveals novel response properties to dynamic stimuli in mouse visual cortex (https://www.biorxiv.org/content/10.1101/2025.09.16.676524v2)
Abstract:
Understanding how the brain encodes complex, dynamic visual stimuli remains a fundamental challenge in neuroscience. Here, we introduce ViV1T, a transformer-based model trained on natural movies to predict neuronal responses in mouse primary visual cortex (V1). ViV1T outperformed state-of-the-art models in predicting responses to both natural and artificial dynamic stimuli, while requiring fewer parameters and reducing runtime. Despite being trained exclusively on natural movies, ViV1T accurately captured core V1 properties, including orientation and direction selectivity as well as contextual modulation, despite lacking explicit feedback mechanisms. ViV1T also revealed novel functional features. The model predicted a wider range of contextual responses when using natural and model-generated surround stimuli compared to traditional gratings, with novel model-generated dynamic stimuli eliciting maximal V1 responses. ViV1T also predicted that dynamic surrounds elicited stronger contextual modulation than static surrounds. Finally, the model identified a subpopulation of neurons that exhibit contrast-dependent surround modulation, switching their response to surround stimuli from inhibition to excitation when contrast decreases. These predictions were validated through semi-closed-loop in vivo recordings. Overall, ViV1T establishes a powerful, data-driven framework for understanding how brain sensory areas process dynamic visual information across space and time.

Date: October 8, 2025
Time: 2:00pm
Zoom: Upon request @ [email protected]
Title:
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
Abstract:
Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.3 million neurons from the visual cortex of 78 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task transformer models (1M–300M parameters) that support three regimes flexibly at test time: neural prediction (predicting neuronal responses from sensory input and behavior), behavioral decoding (predicting behavior from neural activity), neural forecasting (predicting future activity from current neural dynamics), or any combination of the three. We find that performance scales reliably with more data, but gains from increasing model size saturate — suggesting that current brain models are limited by data rather than compute. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling — even in the mouse visual cortex, a relatively simple and low-resolution system — models remain data-limited despite vast recordings. These findings highlight the need for richer stimuli, tasks, and larger-scale recordings to build brain foundation models. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models.

Vinam Arora and Ji Xia
Date: August 27, 2025
Location: JLGSC-L3-079
Time: 2:00pm
Zoom: Upon request @ [email protected]
Title and Abstracts:
1st Speaker: Vinam Arora
Title: Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
Abstract: Identifying the functional identity of individual neurons is essential for interpreting circuit dynamics, yet remains a major challenge in large-scale in vivo recordings where anatomical and molecular labels are often unavailable. Here we introduce NuCLR, a self-supervised framework that learns context-aware representations of neuron identity by modeling each neuron's role within the broader population. NuCLR employs a spatiotemporal transformer that captures both within-neuron dynamics and across-neuron interactions, and is trained with a sample-wise contrastive objective that encourages stable, discriminative embeddings across time. Across multiple open-access datasets, NuCLR outperforms prior methods in both cell type and brain region classification. It enables zero-shot generalization to entirely new populations—without retraining or access to stimulus labels—offering a scalable approach for real-time, functional decoding of neuron identity across diverse experimental settings.
2nd Speaker:
Title: In painting the neural picture: Inferring Unrecorded Brain Area Dynamics from Multi-Animal Datasets.
Abstract: Understanding how the brain drives memory-guided movements requires recording neural activity from the motor cortex and interconnected subcortical areas. Neuropixels probes now allow simultaneous recordings from subsets of these areas, but no single session captures all areas of interest, and different neurons are sampled from each area across sessions. This poses a key challenge: how to integrate neural data across sessions to reconstruct the complete multi-area picture. We address this with a transformer-based autoencoder that aligns neural activity into a shared latent space across sessions and animals, separately for each brain area, including those not recorded in a given session. This approach enables single-trial analysis of multi-area neural dynamics from all areas of interest. I am now working on improving this method, and will discuss both its present challenges and promising directions for future work.

Date: July 30th, 2025
Location: JLGSC-L3-079
Time: 2:00pm
Zoom: Upon request @ [email protected]
Title: Meta-dynamical state space modeling for integrative neural data analysis
Abstract:
Uncovering the organizing principles of neural systems requires integrating information across diverse datasets—each alone offering a limited view and signal-to-noise ratio, but together revealing coherent dynamical structures. We present a meta-dynamical state-space modeling framework that learns a shared solution space of neural dynamics from heterogeneous recordings across sessions, animals, and tasks. By capturing cross-dataset similarity and variability on a low-dimensional manifold that spans a space of dynamical systems, our approach enables few-shot inference, rapid adaptation to new recordings, and discovery of latent dynamical motifs that underlie behavior. We demonstrate its utility in modeling motor cortex activity, revealing dynamics that generalize across individuals and track the change in dynamics during learning. We argue that for understanding neural computation and real-time neuroscience applications, our approach is well-suited as a foundation model for integrative neuroscience.
Title: POSSM: Generalizable, real-time neural decoding with hybrid state-space models
Abstract:
Real-time decoding of neural spiking data is a core aspect of neurotechnology applications such as brain-computer interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but are less equipped for generalization to unseen data. In contrast, recent Transformer-based approaches leverage large-scale neural datasets to attain strong generalization performance. However, these models typically have much larger computational requirements and are not suitable for settings requiring low latency or limited memory. To address these shortcomings, we present POSSM, a novel architecture that combines individual spike tokenization and an input cross-attention module with a recurrent state-space model (SSM) backbone, thereby enabling (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pre-training. We evaluate our model’s performance in terms of decoding accuracy and inference speed on monkey reaching datasets, and show that it extends to clinical applications, namely handwriting and speech decoding. Notably, we demonstrate that pre-training on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves comparable decoding accuracy with state-of-the-art Transformers, at a fraction of the inference cost. These results suggest that hybrid SSMs may be the key to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.
Frontier Models for Neuroscience and Behavior Working Group Priorly Known as Animal Behavior Video Analysis Working Group
Title: Whole-body simulation of realistic fruit fly locomotion with deep reinforcement learning
Abstract:
The body of an animal determines how the nervous system produces behavior. Therefore, detailed modeling of the neural control of sensorimotor behavior requires a detailed model of the body. Here we contribute an anatomically-detailed biomechanical whole-body model of the fruit fly {\em Drosophila melanogaster} in the \mujoco physics engine. Our model is general-purpose, enabling the simulation of diverse fly behaviors, both on land and in the air. We demonstrate the generality of our model by simulating realistic locomotion, both flight and walking. To support these behaviors, we have extended \mbox{MuJoCo} with phenomenological models of fluid forces and adhesion forces. Through data-driven end-to-end reinforcement learning, we demonstrate that these advances enable the training of neural network controllers capable of realistic locomotion along complex trajectories based on high-level steering control signals. With a visually guided flight task, we demonstrate a neural controller that can use the vision sensors of the body model to control and steer flight. Our project is an open-source platform for modeling neural control of sensorimotor behavior in an embodied context.
Title: Mapping the landscape of social of social behavior using high-resolution 3D tracking of freely interacting animals
Abstract:
Social interaction is a fundamental component of animal behavior. However, we lack tools to describe it with quantitative rigor, limiting our understanding of its principles and the neuropsychiatric disorders, like autism, that perturb it. To address these limitations, I and collaborators have developed a technique for high-resolution 3D tracking of freely interacting animals and their body-wide social touch patterns, solving the challenging subject occlusion and part assignment problems using 3D geometric reasoning, graph neural networks, and semi-supervised learning. Using this technology, I have collected and annotated over 34 million 3D postures in interacting rats, featuring five new monogenic autism models lacking reports of social behavioral phenotypes. I will introduce a novel multi-scale approach which I have used to identify a rich landscape of stereotyped interactions, synchrony, and body contact across strains. This deep phenotyping approach revealed a spectrum of changes in rat autism models and in response to amphetamine, and this framework has the potential to facilitate quantitative studies of social behaviors and their neurobiological underpinnings.
Title: Multimodal Learning from Pixels to People
Abstract:
People experience the world through modalities of sight, sound, words, touch, and more. By leveraging their natural relationships and developing multimodal learning methods, my research creates artificial perception systems with diverse skills, including spatial, physical, logical, and cognitive abilities, for flexibly analyzing visual data. This multimodal approach provides versatile representations for tasks like 3D reconstruction, visual question answering, and object recognition, while offering inherent explainability and excellent zero-shot generalization across tasks. By closely integrating diverse modalities, we can overcome key challenges in machine learning and enable new capabilities for computer vision, especially for the many upcoming applications where trust is required.
Multi-resource-cost Optimization of Neural Network Models

Date: April 7, 2026
Location: ZI L3-079
Time: 1:00pm
Title: Metabolic cost of information processing in Poisson variational autoencoders
Abstract: Computation in biological systems is fundamentally energy-constrained, yet standard theories of computation treat energy as freely available. Here, we argue that variational free energy minimization under a Poisson assumption offers a principled path toward an energy-aware theory of computation. Our key observation is that the Kullback-Leibler (KL) divergence term in the Poisson free energy objective becomes proportional to the prior firing rates of model neurons, yielding an emergent metabolic cost term that penalizes high baseline activity. This structure couples an abstract information-theoretic quantity — the coding rate — to a concrete biophysical variable — the firing rate — which enables a trade-off between coding fidelity and energy expenditure. Such a coupling arises naturally in the Poisson variational autoencoder (P-VAE; a brain-inspired generative model that encodes inputs as discrete spike counts and recovers a spiking form of sparse coding as a special case) but is absent from standard Gaussian VAEs. To demonstrate that this metabolic cost structure is unique to the Poisson formulation, we compare the P-VAE against GReLU-VAE, a Gaussian VAE with ReLU rectification applied to latent samples, which controls for the non-negativity constraint. Across a systematic sweep of the KL term weighting coefficient β and latent dimensionality, we find that increasing β monotonically increases sparsity and reduces average spiking activity in the P-VAE. In contrast, GReLU-VAE representations remain unchanged, confirming that the effect is specific to Poisson statistics rather than a byproduct of non-negative representations. These results establish Poisson variational inference as a promising foundation for a resource-constrained theory of computation.
Zoom Link: Upon request @ [email protected]

Date: April 22, 2026
Location: ZI L5-116
Time: 10am to 12pm
Title and Abstract: TBD
Zoom Link: Upon request @ [email protected]
Zoom Link: Upon request @ [email protected]

Date: March 24, 2026
Location: ZI L3-079
Title and Abstract: TBD
Zoom Link: Upon request @ [email protected]

Date: March 17, 2026
Location: ZI L5-084
Title: Constraints of efficient neural computation
Abstract: Neural systems adapt to the statistical structure of the environment to support behavior. While it is generally recognized that such adaptation is subject to various biological constraints (such as noise, metabolism, wiring cost), how these constraints determine the optimal neural computation remains unclear. For the first part of this talk, I will discuss theories of efficient coding based on consideration of metabolic cost and neural noise. For the second part, I will present ongoing work on how the geometry of the stimulus manifold shapes the structure of neural code. In particular, using the processing of heading direction as an example, I will show that the asymmetry of the stimulus manifold naturally accounts for key properties of heading direction encoding in macaque MST.
Zoom Link: Upon request @ [email protected]

Date: January 20, 2026
Title: Frugal Inference for Control
Zoom Link: Upon request @ [email protected]

Date: December 10, 2025
Time: 11:00am
Location: Zuckerman Institute L5-116
Title: Economics of temporal evidence integration
Abstract: The temporal integration of sensory information is an important aspect of many human decision tasks. I will present results of ongoing research in my laboratory aimed at understanding the dynamic processes underlying evidence integration. In particular, I will discuss a novel resource-rational model that treats both the representation as well as the integration and maintenance of sensory evidence as actively controlled, performance-effort trade-off mechanisms. Validated against data from various behavioral experiments, the model not only provides a normative explanation for observed non-linear dynamics in evidence integration but also a parsimonious explanation for individual tendencies for recency or primacy behavior. As the work is ongoing and unpublished, I am looking forward to an engaged discussion with the audience.
Zoom Link: Upon request @ [email protected]

Date: October 21, 2025
Location: JLGSC-L03-079
Time: 1:00pm
Zoom: Upon request @ [email protected]
Title:
Building the brain’s efficient system-level architecture: optimisations across space, time, and multiple regions
Abstract:
The computations a brain can perform are fundamentally constrained by physical realities: energetic resources are limited, and time is precious. To understand why the brain works the way it does, we must understand its function in the context of these constraints. Prior modeling work has successfully demonstrated how spatial energetic constraints drive structure-function co-optimization, giving rise to many of the architectural features we observe across areas of neuroscience. By incorporating such physical constraints, we can build complex systems-level models that are meaningfully constrained by physically measurable factors rather than arbitrary design choices.
In this talk, I will expand on these spatial frameworks by introducing new work on temporal processing and signal precision constraints in neural networks. I will demonstrate how different optimization strategies within individual regions can be combined in heterogeneous multi-region models, revealing how the brain trades off resource use across tasks and situations. Finally, I will show how space and time interact in surprising ways to achieve efficient computation — principles that apply not only to the brain but to any large-scale distributed computing system. Together, these advances bring us closer to understanding the general principles that enable sophisticated intelligence to emerge from physically and energetically constrained computing systems.

Date: August 7, 2025
Location: JLGSC-L05-84
Time: 2:30pm
Zoom: Upon request @ [email protected]
Title: From 2D Chips to 3D Brains
Abstract:
Artificial intelligence (AI) realizes a synaptocentric conception of the learning brain with dot-products and advances by performing twice as many multiplications every two months. But the semiconductor industry tiles twice as many multipliers on a chip only every two years. Moreover, the returns from tiling these multipliers ever more densely now diminish, because signals must travel relatively farther and farther, expending energy and exhausting heat that scales quadratically. As a result, communication is now much more expensive than computation. Much more so than in biological brains, where energy-use scales linearly rather than quadratically with neuron count. That allows an 86-billion-neuron human brain to use as little power as a single lightbulb (25W) rather than as much as the entire US (3TW). Hence, rescaling a chip’s energy-use from quadratic to linear is critical to scale AI sustainably from trillion (1012) parameters (mouse scale) today to a quadrillion (1015) parameters (human scale) in the next five years. But this would require communication cost to be reduced radically. Towards that end, I will present a recent re-conception of the brain’s fundamental unit of computation that sparsifies signals by moving away from synaptocentric learning with dot-products to dendrocentric learning with sequence detectors.
Title: Can resource optimization explain neuronal morphology and placement?
Abstract:
Title: Control when confidence is costly
Abstract:
We develop a version of stochastic control that accounts for computational costs of inference. Past studies identified efficient coding without control, or efficient control that neglects the cost of synthesizing information. Here we combine these concepts into a framework where agents rationally approximate inference for efficient control. Specifically, we study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state. This creates a trade-off: an agent can obtain more utility overall by sacrificing some task performance, if doing so saves enough bits during inference. We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands, switching from a costly but optimal inference to a family of suboptimal inferences related by rotation transformations, each misestimate the stability of the world. In all cases, the agent moves more to think less. This work provides a foundation for a new type of rational computations that could be used by both brains and machines for efficient but computationally constrained control.
We develop a version of stochastic control that accounts for computational costs of inference. Past studies identified efficient coding without control, or efficient control that neglects the cost of synthesizing information. Here we combine these concepts into a framework where agents rationally approximate inference for efficient control. Specifically, we study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state. This creates a trade-off: an agent can obtain more utility overall by sacrificing some task performance, if doing so saves enough bits during inference. We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands, switching from a costly but optimal inference to a family of suboptimal inferences related by rotation transformations, each misestimate the stability of the world. In all cases, the agent moves more to think less. This work provides a foundation for a new type of rational computations that could be used by both brains and machines for efficient but computationally constrained control.
Title: Neuronal energy consumption: basic measures and trade-offs, and their effects on efficiency
Title: Bounded optimality: A cognitive perspective on neural computation with resource limitations
Abstract:
Language and Vision

Date: April 27, 2026
Location: Virtual
Time: 3pm
Zoom Link: Upon request @ [email protected]

Date: March 23, 2026
Location: Virtual
Zoom Link: Upon request @ [email protected]








