 
 
          Framework. For the scene captured by N images, we use COLMAP and Mask-RCNN to get sparse 3D points and coarse object masks as co-inputs, and predict a dense, geometrical consistent object map, as well as a textural, completed background for each image. To tackle this challenging task by leveraging the existence of geometric consistency of the one-to-one dense mapping in 3D space, we decouple the scene into two complementary neural scene representation modules: a Foreground Consistent Representation (FoCoR) module and a Background Completion (BaCo) module. We build our scene representation modules upon the SDF-based neural surface representation, and incorporate multi-resolution hash encodings for training acceleration.
 
 
          Visualizations details of mask and RGB rendering results at the high-resolution of 3840×2160.
      @ARTICLE{zxyun@surfacesos,
          author={Zheng, Xiaoyun and Liao, Liwei and Jiao, Jianbo and Gao, Feng and Wang, Ronggang},
          journal={IEEE Transactions on Image Processing}, 
          title={Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation}, 
          year={2024},
          volume={33},
          pages={2018-2031}}