Without realistic paired day-night images, synthesizing dark driving images with vehicle lights is quite difficult, limiting the research in this field.
This work introduces LightDiff, a domain-tailored framework designed to enhance the low-light image quality for autonomous driving applications, mitigating the challenges faced by vision-centric perception systems. By leveraging a dynamic data degradation process, a multi-condition adapter for diverse input modalities, and perception-specific score guided reward modeling using reinforcement learning, LightDiff significantly enhances the image quality and 3D vehicle detection in nighttime on the nuScenes dataset. This innovation not only eliminates the need for extensive nighttime data but also ensures semantic integrity in image transformation, demonstrating its potential to enhance safety and reliability in autonomous driving scenarios.
The architecture of LightDiff. During the training stage, a Training Data Generation pipeline enables the acquisition of triple-modality data without any human-collected paired data. Our LightDiff employs a Multi-Condition Adapter to dynamically weight multiple conditions, coupled with LiDAR and Distribution Reward Modeling (LDRM), allowing for perception-oriented control.
Comment: Builds a addition encoder to add spatial conditioning controls to T2I diffusion models.