Overview
This theme studies the intersection of diffusion and flow-based generative models, offline black-box / model-based optimization (MBO), and LLM pretraining. A current focus is Design-Bench 2.0, an LLM-oriented benchmark that adapts offline MBO algorithms to LLM-related tasks, alongside diffusion- and flow-based methods for black-box and multi-objective optimization.
Motivation
This theme sits at the intersection of three lines of work that increasingly reinforce one another: diffusion and flow-based generative modeling, offline model-based optimization (MBO), and large language model pretraining. Offline MBO methods learn to propose high-performing designs purely from a static dataset of past evaluations, and diffusion/flow models have proven to be powerful tools for representing and editing those design distributions. The theme explores how these optimization ideas transfer to the LLM setting, where the “design space” becomes language- and sequence-structured.
Project Goals
- Develop diffusion- and flow-based estimators and samplers for offline black-box and multi-objective optimization.
- Bridge offline MBO algorithms with LLM-related tasks, treating language and sequence generation as an optimization problem.
- Build an LLM-oriented benchmark that standardizes evaluation of these methods.
Recent Progress
The team has started an LLM-oriented Design-Bench 2.0, with the goal of adapting offline MBO algorithms to LLM-related tasks. This extends the classic offline black-box optimization benchmarking setup toward language-model settings, providing a common ground to test diffusion-, flow-, and optimization-based methods on LLM tasks.
Recent published results span diffusion estimation for offline black-box optimization (SPADE, ICML 2026), training diffusion language models directly for black-box optimization (ICML 2026 Spotlight), and a preprint on diffusion large language models for black-box optimization, building on earlier work in design editing (TMLR), guided flows for multi-objective optimization (ICLR 2025), and importance-aware co-teaching (NeurIPS 2023). See the publications list above for details.
Related Publications
-
Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
Yonghan Yang, Ye Yuan, Zipeng Sun, Linfeng Du, Bowei He, Haolun Wu, Can Chen, Xue Liu
ICML 2026 · 2026
-
Training Diffusion Language Models for Black-Box Optimization
Zipeng Sun, Can Chen, Ye Yuan, Haolun Wu, Jiayao Gu, Christopher Pal, Xue Liu
ICML 2026 (Spotlight) · 2026
-
Diffusion Large Language Models for Black-Box Optimization
Ye Yuan, Can Chen, Zipeng Sun, Dinghuai Zhang, Christopher Pal, Xue Liu
arXiv preprint · 2026
-
Design Editing for Offline Model-based Optimization
Ye Yuan, Youyuan Zhang, Can Chen, Haolun Wu, Zixuan Li, Jianmo Li, James J. Clark, Xue Liu
TMLR · 2025
-
ParetoFlow: Guided Flows in Multi-Objective Optimization
Ye Yuan, Can Chen, Christopher Pal, Xue Liu
ICLR 2025 · 2025
-
Importance-aware Co-teaching for Offline Model-based Optimization
Ye Yuan, Can Chen, Zixuan Liu, Willie Neiswanger, Xue Liu
NeurIPS 2023 · 2023
Impact Holders
Impact holders and user communities will be added as the project scope becomes clearer.