Referring Camouflaged Object Detection
Computing Research Repository (CoRR)(2025)
This paper's authors include Zhang Xuying, Yin Bowen, Zheng Lin, Hou Qibin, Fan DENGping, and Cheng Mingming, all from the School of Computer Science and Technology at Nankai University. Their research interests cover deep learning, visual attention, multimedia content analysis and retrieval, 3D scene, self-supervised learning, feature points, diffusion models, computer vision and computer graphics, object detection, image classification, semantic segmentation, salient object detection, attention mechanisms, image segmentation, video object segmentation, camouflaged object detection, cross-modal learning, image-based sketch synthesis, human-centered attention datasets, and medical image processing.
1. Introduction
- The importance of Camouflaged Object Detection (COD) and its application scenarios
- The proposal of Ref-COD: Utilizing reference images to guide the segmentation of specific camouflaged targets
- The advantages of Ref-COD: Reference information is easy to obtain, aligning with human visual perception
- Contributions of the paper: Proposing the Ref-COD benchmark, constructing the R2C7K dataset, and designing the R2CNet framework
2. Related Work
- Progress and existing issues in Camouflaged Object Detection (COD) research
- Progress in Salient Object Detection (SOD) research
- Research on reference-based object segmentation: Image reference and text reference
3. Proposed Dataset
- The construction process and statistical information of the R2C7K dataset
- The dataset includes Camo-subset and Ref-subset
- The category and quantity distribution of the dataset
- The resolution distribution of the dataset
4. Proposed Framework
- The input and output of the Ref-COD system
- The overall architecture of the R2CNet framework: Reference branch and segmentation branch
- Reference branch: SOD network and MAP function extract the common representation of the target object
- Segmentation branch: Based on the encoder-decoder structure, containing RMG and RFE modules
5. Experiments
- Experimental setup: Training and testing protocols, hyperparameter details, evaluation metrics
- Quantitative evaluation:
- Comparison with baseline models: R2CNet outperforms baseline models on all metrics
- Applied to existing COD methods: The Ref-COD method outperforms its COD counterparts on all metrics
- Ablation study:
- The impact of the number of reference images on performance
- The effectiveness of model components
- Comparison of effects with different reference forms
- Qualitative evaluation:
- Prediction result visualization: R2CNet is superior to baseline models in segmenting specific camouflaged targets and identifying target subjects
- Feature visualization: Reference information helps the model focus on the target object
6. Future Work
- Exploring other forms of reference information: Text, voice, etc.
- Extending Ref-COD to handle scenes without targets
- Expanding Ref-COD to related tasks: Question answering, etc.
7. Conclusion
- The effectiveness of the Ref-COD benchmark and the R2CNet framework
- The application prospects and future research directions of Ref-COD
Q: What specific research methods were used in the paper?
- Dataset Construction: Constructed a large-scale dataset named R2C7K, containing 7K images across 64 object categories, divided into Camo-subset and Ref-subset with images of camouflaged objects and salient objects, respectively.
- Network Architecture Design: Proposed a dual-branch network architecture named R2CNet, including:
- Reference Branch: Utilizes salient object images to extract common representations of the target objects.
- Segmentation Branch: Identifies and segments camouflaged objects under the guidance of the common representations from the reference branch.
- Referring Mask Generation (RMG) Module: Generates pixel-level reference masks to guide the segmentation branch.
- Referring Feature Enrichment (RFE) Module: Enhances the segmentation branch's ability to recognize specific camouflaged objects.
- Experimental Evaluation: Compared R2CNet with the baseline COD model and 7 state-of-the-art COD methods on the R2C7K dataset, conducted ablation studies, and performed visualization analysis.
Q: What are the main research findings and achievements?
- Ref-COD Benchmark: Proposed the Ref-COD benchmark, combining salient object detection (SOD) and camouflaged object detection (COD) to achieve the goal of segmenting specific camouflaged objects using salient object images.
- R2C7K Dataset: Constructed the R2C7K dataset, providing a data foundation and insights for Ref-COD research.
- R2CNet Framework: Designed the R2CNet framework, which significantly outperformed COD corresponding methods in segmenting specific camouflaged objects and recognizing the main body of target objects.
- Ref-COD Effectiveness: Experimental results show that the Ref-COD method has advantages in segmenting specific camouflaged objects and recognizing the main body of target objects, and has general applicability to different COD methods.
Q: What are the current limitations of this research?
- Reference Form: Currently, only considers salient object images as references; in the future, exploration of other forms of references, such as text and speech, can be considered.
- Reference Scenario: Assumes that the image to be segmented contains the target object; in the future, Ref-COD can be extended to adapt to situations where the target object is not present.
- Related Tasks: In the future, Ref-COD can be extended to domains related to simple tasks such as question answering.

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
被引用96
A Systematic Review of Image-Level Camouflaged Object Detection with Deep Learning
被引用9
被引用7
Discriminative Consensus Mining with A Thousand Groups for More Accurate Co-Salient Object Detection
被引用0
Toward Open-Set Human Object Interaction Detection
被引用3