Pre-Trained Video Generative Models as World Simulators (DWS) (2025)

Overview

He et al. propose DWS, a method to convert pre-trained video generative models into action-conditioned world simulators. It adds a light action-conditioned module and introduces motion-reinforced loss for better dynamic consistency. Applications demonstrated across games and robotics, with improvements in action controllability. :contentReference[oaicite:6]

Why it matters

Repurposing existing generative models reduces the need to train from scratch, leverages massive internet-scale pretraining, while adding controllability important for real-world tasks (e.g. robotics, planning, simulation).

Key trade-offs / limitations

  • Pre-trained models may still have limitations in fine detail or domain mismatch.
  • The added action module may have limited influence on complex dynamics.
  • Trade-off between visual quality and controllability / dynamic correctness.
arXiv:2502.07825