Vid2Sim (2025) - Daydream API

On this page

Overview
Why it matters
Key trade-offs / limitations
Link

Overview

Vid2Sim bridges the sim-to-real gap by converting monocular videos into photorealistic, physically interactable 3D simulation environments. It enables RL training of visual navigation agents in complex urban environments using neural 3D scene reconstruction and simulation.

Why it matters

Addresses the major challenge of sim-to-real transfer for robot learning by creating realistic digital twins from minimal video input. Enables scalable, cost-efficient training for urban navigation applications like food delivery bots and assistive vehicles.

Key trade-offs / limitations

Time-consuming scene building process requiring GLOMAP initialization
Limited to 30 environments in current dataset (needs expansion for better performance)
Requires extensive geometric processing for scene reconstruction
Weather simulation capabilities still under development

Link

arxiv:2501.06693

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling (2025)Genie 3 (2025)