BEVControl introduces a two-stage generative method that separates geometric control from appearance by using Bird-Eye View (BEV) sketch layouts. It aims to produce realistic street-view images consistent with both foreground and background, and supports human-editable sketch input. The method also proposes a multi-level evaluation protocol to fairly assess scene, object, and background geometry fidelity. :contentReference[oaicite:0]
This work offers fine control in synthesis for autonomous driving and scene understanding, leveraging BEV layouts so that downstream perception models (e.g. for detection or segmentation) can be trained on data that is both controllable and consistent. It shows improved performance over BEVGen especially in terms of foreground object consistency.