IFML Seminar: 02/06/26 - Are Diffusion and Autoregression Truly Different? Insights from Masked Diffusion Models

Jiaxin Shi, former research scientist at Google DeepMind

12:15 - 1:15pm

The University of Texas at Austin
Gates Dell Complex (GDC 6.302)
2317 Speedway
Austin, TX 78712
United States

Abstract: Modern generative AI has developed along two distinct paths: autoregressive models for language and diffusion models for image and video. This divide exists because each model class has a unique strength—one excels at compression (e.g., measured by perplexity), while the other sets the standard for perceptual quality. Despite many efforts, a unified paradigm capable of mastering both domains has proven difficult to achieve.

In this talk, I will show how this divide is challenged by Masked Diffusion Models (MDMs), a diffusion model class that is theoretically equivalent to random-order autoregressive models. While MDMs are not new, recent work by us and a few other groups have finally allowed us to train and scale them properly. On one hand, MDMs deliver strong performance on compression-heavy tasks such as language modeling while enabling diffusion-style parallel sampling. On the other hand, we demonstrate MDMs with reweighted loss—trained in raw pixel space without any perceptual tokenizers—can achieve sample quality comparable to strong continuous diffusion models. The key to this performance is our theoretical finding that reweighted objectives widely used in continuous diffusion are not merely heuristics, but improved variational bounds. Applying the same technique, MDMs, or a random-order autoregressive model, can achieve state-of-the-art pixel-space generation. These results suggest that the distinction between diffusion and autoregression is far less fundamental than previously thought.

Speaker info: Jiaxin Shi was a research scientist at Google DeepMind. He obtained his Ph.D. from Tsinghua University in 2020. Prior to joining DeepMind, he was a postdoctoral researcher at MSR New England and Stanford University. His research interests broadly involve probabilistic and algorithmic models for learning and the inferential questions associated with them. Jiaxin served as an area chair for NeurIPS, ICML and AISTATS. His first-author paper was recognized by a NeurIPS 2022 outstanding paper award.

Zoom link: https://utexas.zoom.us/j/84254847215