Huang et al. present Vid2World, a method that converts pre-trained video diffusion models into interactive world models by adding architectural adjustments and a training objective that enables autoregressive generation plus action controllability. They test in robotic manipulation and game simulations. :contentReference[oaicite:2]
Pretrained video diffusion models are powerful in generating realistic dynamics, but often lack control. This work provides a pathway to reuse those models in interactive settings, which expands their applicability in robotics, simulation, etc.