Overview

BEVGen is a conditional generative model that synthesizes a set of street-view images given a BEV (Bird’s-Eye View) segmentation layout. It uses a cross-view transformation and spatial attention to ensure consistency across views (map layout + street view). Evaluated on datasets like NuScenes and Argoverse 2, it generates varied scenes under different weather and lighting, maintaining road/lanes consistency. :contentReference[oaicite:1]

Why it matters

Helps simulate realistic driving environments for perception tasks, linking layout maps to photorealistic street-view images. Useful for data augmentation, visualization, simulation in autonomous driving. It provides a baseline for controllable layout-to-image generation.

Key trade-offs / limitations

  • Limited to static scenes; dynamic motion or temporal consistency are not the focus.
  • Weather/time variation works, but more extreme out-of-distribution scenarios may degrade quality.
  • Less control at object‐specific or fine detail level compared to methods that use more geometric input.
arXiv:2301.04634