How much is brain data worth?

PI: Xaq Pitkow
Co-PI: David Schwab, CUNY

Abstract

If we want smarter algorithms, will we benefit from studying the brain One hypothesis is that since brain computations already solve challenging problems, neural activity must constitute “better” data than
conventional data. Here we aim to develop theory to elaborate and test this use-inspired question. Current AI uses massive scale to achieve impressive performance: big data, big models, and big compute. For example, current Large Language Models (LLMs) use data volumes equivalent to all books ever written. Despite good performance, many tasks require cognitive understanding that remains beyond current models. Are we using the right data to achieve human-level generalization? There is a common expectation that even these hard problems will eventually be solvable through scaling, given massive enough data [Sutton 2019]. Even if this is technically true in the limit of infinite data, it may be that the scaling is so unfavorable that brute force training will remain out of reach in any practical sense. But what if a new data source is obtained from the internal representations of an agent that can successfully perform the task of interest? Can this vastly improve learning, and replace torrents of unspecific data? More concretely, can we solve hard machine learning problems by measuring human brains while they solve them? Here we aim to create an abstract theory of how learning scales under different tasks and data streams, ultimately to provide guidance for future learning with better data, with particular application to finding smarter algorithms by studying the brain.

This problem is related to “distillation” from internal representations, the training of a student neural network based on measured activations of another, larger, well-trained teacher [Aguilar et al. 2020, Heo et al. 2018]. In our target case, the teacher will ultimately be the brain, although we will use artificial networks as proxies to study theoretical properties of this problem. The relative benefits of typical training data versus internal distillation data from the brain of a successful agent likely depends on the structure of the task, especially its complexity. We hypothesize that there is a scaling law that determines the exchange rate between training data and distillation data as a function of this complexity (Figure 1). Indeed, past work shows that a properly curated data ensemble can speed learning by an enormous amount, in some cases transitioning from power law scaling to exponential scaling [Sorscher et al. 2022].

Publications

In progress

Resources

In progress