BEVGen is a conditional generative model that synthesizes a set of street-view images given a BEV (Bird’s-Eye View) segmentation layout. It uses a cross-view transformation and spatial attention to ensure consistency across views (map layout + street view). Evaluated on datasets like NuScenes and Argoverse 2, it generates varied scenes under different weather and lighting, maintaining road/lanes consistency. :contentReference[oaicite:1]
Helps simulate realistic driving environments for perception tasks, linking layout maps to photorealistic street-view images. Useful for data augmentation, visualization, simulation in autonomous driving. It provides a baseline for controllable layout-to-image generation.