Long Context Foundational Models

Srinadh Bhojanapalli, Research Scientist at Google Research

12:15 - 1 pm

The University of Texas at Austin
Gates Dell Complex (GDC 6.302)
United States

Abstract: Foundational large language models, while successful at shorter contexts, struggle to scale to longer context inputs. Preventing performance decay of Transformers when input lengths exceed those used during training has been a significant challenge in extending their capabilities. Though the Transformer architecture itself has no inherent input sequence length limits, current training methods constrain their performance on longer inputs. In this talk, we will present new results in scaling Transformer input length and discuss some open challenges in making attention approaches efficient for longer contexts.

Speaker Bio: Srinadh Bhojanapalli is a Staff Research Scientist at Google Research, New York. His research is focused on developing principled understanding of Transformer models and scaling them efficiently. Prior to joining Google, Srinadh served as a Research Assistant Professor at TTI Chicago. He holds a Ph.D. in Electrical and Computer Engineering, earned at The University of Texas at Austin, where he was mentored by Prof. Sujay Sanghavi.

Event Registration