ProphetDWM: An End-to-End Driving World Model for Joint Video and Action Prediction (2025)

Overview

ProphetDWM is an autonomous driving world model that jointly predicts future video frames and driving actions. It features a diffusion-based transition module and an action learning module, trained jointly for alignment.

Why it matters

Brings world models closer to real-world use cases by combining video imagination with action prediction, useful for self-driving and planning systems.

Key trade-offs / limitations

  • Specialized for driving; generalization to other domains untested.
  • Requires large datasets like NuScenes for training.
arXiv:2505.18650