Overview

HunyuanWorld-Voyager generates consistent 3D point-cloud video sequences from a single image and camera path. It outputs RGB + depth video, supports long-range world exploration, and includes auto-regressive inference for scene extension.

Why it matters

Enables controllable, traversable 3D scene generation from minimal input. Supports AR/VR, robotics, and content creation.

Key trade-offs / limitations

  • High compute/memory requirements for large scenes.
  • Some artifacts in depth alignment and geometry remain.
  • Not fully real-time interactive.
arXiv:2506.04225