Loading Events

« All Events

  • This event has passed.

Continual Learning Working Group: Haozhe Shan

September 20 @ 3:30 pm - 5:00 pm

Speaker: Haozhe Shan

Title: A theory of continual learning in deep neural networks: task relations, network architecture and learning procedure
Abstract: Imagine listening to this talk and afterwards forgetting everything else you’ve ever learned. This absurd scenario would be commonplace if the brain could not perform continual learning (CL) – acquiring new skills and knowledge without dramatically forgetting old ones. Ubiquitous and essential in our daily life, CL has proven a daunting computational challenge for neural networks (NN) in machine learning. When is CL especially easy or difficult for neural systems, and why?

Towards answering these questions, we developed a statistical mechanics theory of CL dynamics in deep NNs. The theory exactly describes how the network’s input-output mapping evolves as it learns a sequence of tasks, as a function of the training data, NN architecture, and the strength of a penalty applied to between-task weight changes. We first analyzed how task relations affect CL performance, finding that they can be efficiently described by two metrics: similarity between inputs from two tasks in the NN’s feature space (“input overlap”) and consistency of input-output rules of different tasks (“rule congruency”). Higher input overlap leads to faster forgetting while lower congruency leads to stronger asymptotic forgetting – predictions which we validated with both synthetic tasks and popular benchmark datasets. Surprisingly, we found that increasing the network depth reshapes geometry of the network’s feature space to decrease input overlap between tasks and slow forgetting. The reduced cross-task overlap in deeper networks also leads to less anterograde interference during CL but at the same time hinders their ability to accumulate knowledge across tasks. Finally, our theory can well match CL dynamics in NNs trained with stochastic gradient descent (SGD). Using noisier, faster learning during CL is equivalent to weakening the weight-change penalty. Link to preprint: https://arxiv.org/abs/2407.10315.

Bio: Haozhe Shan joined Columbia University as an ARNI Postdoctoral Fellow in August 2024. He recently received a Ph.D. in Neuroscience from Harvard, advised by Haim Sompolinsky. His research applies quantitative tools from physics, statistics and other fields to discover computational principles behind neural systems, both biological and artificial. A recent research interest is the ability of neural systems to continually learn and perform multiple tasks in a flexible manner.

Zoom Link: https://columbiauniversity.zoom.us/j/97176853843?pwd=VLZdh6yqHBcOQhdf816lkN5ByIpIsF.1

Details

Date:
September 20
Time:
3:30 pm - 5:00 pm

Organizer

Continual Learning Working Group

Venue

CEPSR 620
Schapiro 530 W. 120th St + Google Map