S2R-ViT  for multi-agent cooperative perception: Bridging the gap from simulation to reality
Jinlong Li1Runsheng Xu2Xinyu Liu1Baolu Li1Qin Zou3Jiaqi Ma2Hongkai Yu1*
1 Cleveland State University, 2 UCLA, 3 Wuhan University
Our research is focused on utilizing labeled simulated data and unlabeled real-world data as transfer learning to reduce the domain gap for multi-agent cooperative perception. Illustration of the Domain gap (Deployment Gap, Feature Gap) for multi-agent cooperative perception from simulation to reality. Here we use Vehicle-to-Vehicle (V2V) cooperative perception in autonomous driving as example. CAV indicates the Connected Autonomous Vehicles.
Overview
This paper is the first work that investigates the domain gap on multi-agent cooperation perception from simulation to reality, specifically focusing on the deployment gap and feature gap in point cloud-based 3D object detection. Based on the analysis, we present the first Simulation-to-Reality transfer learning framework using a novel Vision Transformer, named S2R-ViT, to mitigate these two types of domain gaps, which mainly contain an Uncertainty-aware Vision Transformer and an Agent-based Feature Adaptation module. The experiment shows the effectiveness of S2R-ViT. This research presents a significant step forward in the multiagent cooperation perception from simulation to reality.
S2R-UViT: Simulation-to-Reality Uncertainty-aware Vision Transformer
The Deployment Gap from simulation to reality brings different uncertainties to both ego and neighboring agents, e.g., spatial bias by GPS errors, spatial misalignment in the coordinate projection because of communication latency. How to effectively reduce the degradation effects of these uncertainty drawbacks is an essential problem and open question to the S2R multi-agent perception research. In this paper, we propose to answer this question from two perspectives: uncertainties can be relieved by enhancing (1) the feature interactions across all agents' spatial positions more comprehensively and (2) the ego-agent features by considering the shared other-agent features of different uncertainty levels. These two perspectives motivate us to develop the novel Local-and-Global Multi-head Self Attention (LG-MSA) Module and Uncertainty-Aware Module (UAM) respectively.
Qualitative Results
Robustness in Deployment-Gap Scenario.
BibTeX
  @inproceedings{li2024s2r,
    title={S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality},
    author={Li, Jinlong and Xu, Runsheng and Liu, Xinyu and Li, Baolu and Zou, Qin and Ma, Jiaqi and Yu, Hongkai},
    booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
    pages={16374--16380},
    year={2024},
    organization={IEEE}
  }