NSF IFML Researchers Win Outstanding Paper Award at ICML 2025

"Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions"

ICML 2025

Researchers at the NSF AI Institute for Foundations of Machine Learning (IFML) took home an outstanding paper award at ICML 2025, the 42nd International Conference on Machine Learning, held July 13–19, 2025 in Vancouver, Canada. 
 

Kulin Shah, a computer science PhD student at UT Austin who is advised by IFML Director Adam Klivans, and Jaeyeon Kim from Harvard are co-first authors. IFML postdoctoral fellow Vasilis Kontonis is co-author along with Sitan Chen and Sham Kakade from Harvard, both frequent IFML collaborators. 
 

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions, was one of six papers to be given the award, out of more than 12,000 paper submissions, and 3,200 papers accepted for the conference. The award recognizes factors such as clarity, insight, creativity, and potential for lasting impact. 
 

Smart Decoding for Masked Diffusion Models 
 

In the winning paper, the research team takes a close look at training and inference of masked diffusion models (MDMs) – a new approach to generating data-like text or images – and compares them with the more traditional autoregressive models (ARMs). MDMs are harder to train because they must learn to guess missing parts of data in any order, while ARMs do it in a fixed sequence (from left to right). The paper shows that MDMs can be improved by using smart strategies during the guessing process, which helps them perform better, especially on challenging tasks like solving Sudoku puzzles. To this end, a novel sampling adapter for diffusion models is introduced, which significantly improves performance both on generative perplexity for language modeling, as well as solving logical puzzles like Sudoku. The researchers showed that using these smart strategies increased MDMs' accuracy from less than 7% to nearly 90%, suggesting that MDMs could become competitive with ARMs if used correctly. 
 

Broader Impact on Machine Learning
 

Reviewers noted that the broader impact of this research could be quite promising, especially in generative modeling and structured reasoning tasks. MDMs, when paired with smart decoding strategies, can rival or even outperform ARMs on tasks like language modeling and logic puzzles. Unlike ARMs, which generate data in a fixed sequence, MDMs can handle flexible token orders. This opens up possibilities for more robust and adaptable generation across domains like text, code, and biological sequences. 
 

The dramatic boost in accurately solving Sudoku puzzles—from under 7% to nearly 90%—suggests MDMs could be powerful tools for tasks requiring logical inference and constraint satisfaction. Applications like protein and RNA sequence generation are already being explored using MDMs, indicating their utility in bioinformatics and computational biology.

Read the full paper.