Referring Camouflaged Object Detection

Xuying Zhang,Bowen Yin,Zheng Lin,Qibin Hou,Deng-Ping Fan,Ming-Ming Cheng

Computing Research Repository (CoRR)（2025）

Cited 7|Views40

Abstract

We consider the problem of referring camouflaged object detection (Ref-COD), a new task that aims to segment specified camouflaged objects based on a small set of referring images with salient target objects. We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios. Then, we develop a simple but strong dual-branch framework, dubbed R2CNet, with a reference branch embedding the common representations of target objects from referring images and a segmentation branch identifying and segmenting camouflaged objects under the guidance of the common representations. In particular, we design a Referring Mask Generation module to generate pixel-level prior mask and a Referring Feature Enrichment module to enhance the capability of identifying specified camouflaged objects. Extensive experiments show the superiority of our Ref-COD methods over their COD counterparts in segmenting specified camouflaged objects and identifying the main body of target objects. Our code and dataset are publicly available at https://github.com/zhangxuying1004/RefCOD.

Translated text

Key words

Referring Camouflaged Object Detection,Common Representations,R2C7K Dataset,R2CNet Framework

Bibtex

AI Read Science

Video&Figures

论文作者介绍

This paper's authors include Zhang Xuying, Yin Bowen, Zheng Lin, Hou Qibin, Fan DENGping, and Cheng Mingming, all from the School of Computer Science and Technology at Nankai University. Their research interests cover deep learning, visual attention, multimedia content analysis and retrieval, 3D scene, self-supervised learning, feature points, diffusion models, computer vision and computer graphics, object detection, image classification, semantic segmentation, salient object detection, attention mechanisms, image segmentation, video object segmentation, camouflaged object detection, cross-modal learning, image-based sketch synthesis, human-centered attention datasets, and medical image processing.

文献大纲

1. Introduction
- The importance of Camouflaged Object Detection (COD) and its application scenarios
- The proposal of Ref-COD: Utilizing reference images to guide the segmentation of specific camouflaged targets
- The advantages of Ref-COD: Reference information is easy to obtain, aligning with human visual perception
- Contributions of the paper: Proposing the Ref-COD benchmark, constructing the R2C7K dataset, and designing the R2CNet framework
2. Related Work
- Progress and existing issues in Camouflaged Object Detection (COD) research
- Progress in Salient Object Detection (SOD) research
- Research on reference-based object segmentation: Image reference and text reference
3. Proposed Dataset
- The construction process and statistical information of the R2C7K dataset
- The dataset includes Camo-subset and Ref-subset
- The category and quantity distribution of the dataset
- The resolution distribution of the dataset
4. Proposed Framework
- The input and output of the Ref-COD system
- The overall architecture of the R2CNet framework: Reference branch and segmentation branch
- Reference branch: SOD network and MAP function extract the common representation of the target object
- Segmentation branch: Based on the encoder-decoder structure, containing RMG and RFE modules
5. Experiments
- Experimental setup: Training and testing protocols, hyperparameter details, evaluation metrics
- Quantitative evaluation:
- Comparison with baseline models: R2CNet outperforms baseline models on all metrics
- Applied to existing COD methods: The Ref-COD method outperforms its COD counterparts on all metrics
- Ablation study:
- The impact of the number of reference images on performance
- The effectiveness of model components
- Comparison of effects with different reference forms
- Qualitative evaluation:
- Prediction result visualization: R2CNet is superior to baseline models in segmenting specific camouflaged targets and identifying target subjects
- Feature visualization: Reference information helps the model focus on the target object
6. Future Work
- Exploring other forms of reference information: Text, voice, etc.
- Extending Ref-COD to handle scenes without targets
- Expanding Ref-COD to related tasks: Question answering, etc.
7. Conclusion
- The effectiveness of the Ref-COD benchmark and the R2CNet framework
- The application prospects and future research directions of Ref-COD

关键问题

Q: What specific research methods were used in the paper?
- Dataset Construction: Constructed a large-scale dataset named R2C7K, containing 7K images across 64 object categories, divided into Camo-subset and Ref-subset with images of camouflaged objects and salient objects, respectively.
- Network Architecture Design: Proposed a dual-branch network architecture named R2CNet, including:
  - Reference Branch: Utilizes salient object images to extract common representations of the target objects.
  - Segmentation Branch: Identifies and segments camouflaged objects under the guidance of the common representations from the reference branch.
  - Referring Mask Generation (RMG) Module: Generates pixel-level reference masks to guide the segmentation branch.
  - Referring Feature Enrichment (RFE) Module: Enhances the segmentation branch's ability to recognize specific camouflaged objects.
- Experimental Evaluation: Compared R2CNet with the baseline COD model and 7 state-of-the-art COD methods on the R2C7K dataset, conducted ablation studies, and performed visualization analysis.
Q: What are the main research findings and achievements?
- Ref-COD Benchmark: Proposed the Ref-COD benchmark, combining salient object detection (SOD) and camouflaged object detection (COD) to achieve the goal of segmenting specific camouflaged objects using salient object images.
- R2C7K Dataset: Constructed the R2C7K dataset, providing a data foundation and insights for Ref-COD research.
- R2CNet Framework: Designed the R2CNet framework, which significantly outperformed COD corresponding methods in segmenting specific camouflaged objects and recognizing the main body of target objects.
- Ref-COD Effectiveness: Experimental results show that the Ref-COD method has advantages in segmenting specific camouflaged objects and recognizing the main body of target objects, and has general applicability to different COD methods.
Q: What are the current limitations of this research?
- Reference Form: Currently, only considers salient object images as references; in the future, exploration of other forms of references, such as text and speech, can be considered.
- Reference Scenario: Assumes that the image to be segmented contains the target object; in the future, Ref-COD can be extended to adapt to situations where the target object is not present.
- Related Tasks: In the future, Ref-COD can be extended to domains related to simple tasks such as question answering.

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Referring Camouflaged Object Detection

Q: What specific research methods were used in the paper?

Q: What are the main research findings and achievements?

Q: What are the current limitations of this research?