SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching (2025)

On this page

Overview
Why it matters
Key trade-offs / limitations
Link

Overview

SpeCa introduces speculative sampling to diffusion models, drawing inspiration from speculative decoding in large language models. The framework predicts intermediate features for subsequent timesteps based on fully computed reference timesteps, then uses a parameter-free verification mechanism to evaluate prediction reliability. This “forecast-then-verify” approach enables real-time decisions to accept or reject predictions with negligible computational overhead (1.67%-3.5% of full inference costs). SpeCa also implements sample-adaptive computation allocation, dynamically modulating resources based on generation complexity to optimize efficiency across varying sample difficulty levels.

Why it matters

Diffusion models face fundamental computational bottlenecks: strict temporal dependencies preventing parallelization and intensive forward passes at each denoising step. Modern video generation models like HunyuanVideo require 595.46 TFLOPs per forward pass, making real-time generation prohibitive. SpeCa’s 6.34× acceleration while maintaining generation quality represents a breakthrough for practical deployment of diffusion models in real-time applications, from interactive content creation to live video synthesis.

Key trade-offs / limitations

Acceleration benefits may vary significantly across different model architectures and generation tasks.
Verification mechanisms, while lightweight, still add computational overhead that could compound in some scenarios.
Sample-adaptive computation may introduce inconsistent latency patterns in production systems requiring predictable timing.
The forecasting accuracy depends on the predictability of feature evolution, potentially limiting gains for highly complex or chaotic generation patterns.

Link

arXiv:2509.11628

DIAMOND (2024)ImageGS (2025)

Tutorials

StreamDiffusion Reference

Video and World Models

Models

Methodologies and Benchmarking

SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching (2025)

Overview

Why it matters

Key trade-offs / limitations

Link

Tutorials

StreamDiffusion Reference

Video and World Models

Models

Methodologies and Benchmarking

​Overview

​Why it matters

​Key trade-offs / limitations

​Link

Overview

Why it matters

Key trade-offs / limitations

Link