Generating a Video: Reflecting on a Two-Year Odyssey

Atlas Wang, Associate Professor, The University of Texas at Austin

12:15 - 1 pm

The University of Texas at Austin
Gates Dell Complex (GDC 6.302)
United States

Abstract: In this talk, I will recount the developmental trajectory of video generation models at Picsart AI Research over the past two years—a journey that has taken us from initial baselines to the frontiers of ultra-long video streaming and storytelling. Our inaugural project Text2Video-Zero, presented at ICCV 2023, marked a milestone as the first training-free video generator to leverage pre-trained Stable Diffusion models, serving as a versatile foundation for subsequent works and earning widespread acclaim. Building on this success, our team ventured into creating of the first open-source video generator capable of producing ultra-long sequences. Our new model, StreamingT2V, reliably generates up to 1200 frames—equating to a video duration of 2 minutes—with potential for scaling to even more prolonged timeframes. Concluding the talk, I will share personal insights and reflections gleaned from this intensive R&D period, while highlighting the untapped possibilities for the future video generation models.

Speaker Bio: Atlas Wang is an associate professor at UT Austin, affiliated with ECE (primary), CS (GSC), and Oden Institute. He leads the VITA research group (https://vita-group.github.io/) and works on various machine learning frontiers. He has won many awards and graduated 16 postdoc or PhD students thus far (including 4 new academics). On the flip side of academia, he has been navigating some pretty exciting waters in the industry: first as a visiting scholar in Amazon working on cold-start recommendation systems (2021-2022); then an AI research director for Picsart overseeing their generative image and video efforts (2022-2024); and since May 2024, the research director for XTX Markets focusing on AI for high-frequency trading.

Event Registration