Events

Workshop

-

Mathematics of Deep Learning Workshop

The University of Texas at Austin Machine Learning Lab
Gates Dell Complex (GDC 6.302)
Austin, TX 78712
United States

Event Registration
Mathematics of Deep Learning

This workshop will be held in the Gates Dell Complex, room 6.302. 

Capacity is limited to 70; kindly use the linked Eventbrite to register for this free event. 

Agenda: 

Feb 20, 2025

8:30-9:00am Coffee and Breakfast

9:00-9:45am Joan Bruna, Professor of Computer Science, Data Science and Mathematics at the Courant Institute and Center for Data Science, New York University

9:55-10:40am Vardan Papyan, Assistant Professor, Department of Mathematics and Department of Computer Science, University of Toronto

  • Talk: Block Coupling and its Correlation with Generalization in LLMs and ResNets
  • Abstract: In this talk, we dive into the internal workings of both Large Language Models and ResNets by tracing input trajectories through model layers and analyzing Jacobian matrices. We uncover a striking phenomenon—block coupling—where the top singular vectors of these Jacobians synchronize across inputs or depth as training progresses. Interestingly, this coupling correlates with better generalization performance. Our findings shed light on the intricate interactions between input representations and suggest new pathways for understanding training dynamics, model generalization, and Neural Collapse.

10:40-11:00am Coffee Break

11:00-11:45am Thomas Chen, Professor of Mathematics, University of Texas at Austin

  • Talk: Explicit construction of global minimizers and the interpretability problem in Deep Learning
  • Abstract: In this talk, we present some recent results aimed at the rigorous mathematical understanding of how and why supervised learning works. We point out genericness conditions related to reachability of zero loss minimization and underparametrized versus overparametrized Deep Learning (DL) networks. For underparametrized DL networks, we explicitly construct global, zero loss cost minimizers for sufficiently clustered data. In addition, we derive effective equations governing the cumulative biases and weights, and show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. For overparametrized DL networks, we prove that the gradient descent flow is homotopy equivalent to a geometrically adapted flow that induces a (constrained) Euclidean gradient flow in output space. If a certain rank condition holds, the latter is, upon reparametrization of the time variable, equivalent to simple linear interpolation. This in turn implies zero loss minimization and the phenomenon known as “Neural Collapse”. A majority of this work is joint with Patricia Munoz Ewald (UT Austin).
     

12:00-12:45pm Jonathan Siegel, Assistant Professor, Mathematics Department at Texas A&M University.


Break for Individual Lunch

2:30-4:00pm - Three graduate student talks, 25 mins each
 

Feb 21, 2025

8:30-9:00am Coffee and Breakfast

9:00-9:45am - Richard Tsai, Professor Department of Mathematics and Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin

9:55-10:40am - Nhat Ho, Assistant Professor Department of Statistics and Data Sciences, The University of Texas at Austin

10:40-11:00am - Coffee Break

11:00-12:00pm - Yoav Wald, Faculty Fellow at the Center for Data Science, New York University

12:15-1:00pm - Eli Grigsby, Professor of Mathematics, Boston College

1:00-2:00 - Faculty Lunch (GDC 4.202)

2:30-4:00pm - Three graduate student talks, 25 mins each

 

 

Event Registration