MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

1The Chinese University of Hong Kong, 2Hong Kong University of Science and Technology, 3Huawei Noah's Ark Lab
(Corresponding authors. )
Algorithm description of
        LucidDreamer

MagicDrive3D generates highly realistic 3D street scenes with diverse contorls.

Abstract

While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its transformative potential for autonomous driving simulation and beyond.

left: 3DGS. right: MagicDrive3D.

Method

Algorithm description of
        MagicDrive3D

For controllable street scene generation, MagicDrive3D decomposes the task into two steps: ① conditional multi-view video generation, which tackles the control signals and provides detailed prior of the scene; and ② scene reconstruction with deformable Gaussian splatting, which guarantees view consistency for any-view rendering.

Bullet Time!

Video generation proposed by MagicDrive3D make it possible for static scene generation, facilitating scene reconstruction. It is like creating novel bullet-time scenes from driving dataset.

left: multi-view video from MagicDrive3D. right: final generated scene from MagicDrive3D (compared with 3DGS).

Controllability

Precise control over objects and some road sematics is available by MagicDrive3D. Besides, text control is also applicable!

Editing 1
Editing 2
Editing 3

Data Engine

Controllable street scene generation ability makes MagicDrive3D a powerful data engine. We show how generated scenes can help to improve the viewpoint robustness on CVT.

Downstream performance