DIAMOND is a diffusion-based world model trained on environment frames that demonstrates improved visual fidelity and downstream RL performance (e.g., on Atari). The paper argues that preserving visual detail via diffusion improves performance for pixel-based RL agents compared to coarse latent transitions.