Events
IFML Seminar
IFML Seminar: 02/06/26 - Are Diffusion and Autoregression Truly Different? Insights from Masked Diffusion Models
Jiaxin Shi, former research scientist at Google DeepMind
-The University of Texas at Austin
Gates Dell Complex (GDC 6.302)
2317 Speedway
Austin, TX 78712
United States
Abstract: Modern generative AI has developed along two distinct paths: autoregressive models for language and diffusion models for image and video. This divide exists because each model class has a unique strength—one excels at compression (e.g., measured by perplexity), while the other sets the standard for perceptual quality. Despite many efforts, a unified paradigm capable of mastering both domains has proven difficult to achieve.
In this talk, I will show how this divide is challenged by Masked Diffusion Models (MDMs), a diffusion model class that is theoretically equivalent to random-order autoregressive models. While MDMs are not new, recent work by us and a few other groups have finally allowed us to train and scale them properly. On one hand, MDMs deliver strong performance on compression-heavy tasks such as language modeling while enabling diffusion-style parallel sampling. On the other hand, we demonstrate MDMs with reweighted loss—trained in raw pixel space without any perceptual tokenizers—can achieve sample quality comparable to strong continuous diffusion models. The key to this performance is our theoretical finding that reweighted objectives widely used in continuous diffusion are not merely heuristics, but improved variational bounds. Applying the same technique, MDMs, or a random-order autoregressive model, can achieve state-of-the-art pixel-space generation. These results suggest that the distinction between diffusion and autoregression is far less fundamental than previously thought.
Speaker info: Jiaxin was a research scientist at Google DeepMind.
Zoom link: https://utexas.zoom.us/j/84254847215