Overview

Genie 3 is DeepMind’s interactive world model that generates real-time, navigable 3D environments from text or image prompts. It runs at ~24 fps, 720p resolution, maintaining visual and physical consistency over several minutes.

Why it matters

Pushes beyond passive video generation into real-time interactive environments, useful for robotics, simulation, and gaming. Marks progress toward general world simulators.

Key trade-offs / limitations

  • Public details on architecture are limited.
  • Resolution capped at 720p; artifacts still appear.
  • Likely high compute cost for real-time inference.
DeepMind Blog: Genie 3