This paper is the first work that investigates the domain gap on multi-agent cooperation perception from simulation to reality, specifically focusing on the deployment gap and feature gap in point cloud-based 3D object detection. Based on the analysis, we present
the first Simulation-to-Reality transfer learning framework using a novel Vision Transformer, named S2R-ViT, to mitigate these two types of domain gaps, which mainly contain an Uncertainty-aware Vision Transformer and an Agent-based Feature Adaptation module. The experiment shows the effectiveness of S2R-ViT. This research presents a significant step forward in the multiagent cooperation perception from simulation to reality.