27 December 2022 Transformer-based difference fusion network for RGB-D salient object detection
Zhi-Qiang Cui, Feng Wang, Zheng-Yong Feng
Author Affiliations +
Abstract

RGB-D salient object detection (SOD) can usually be divided into three stages: feature extraction, feature fusion, and feature prediction. Most approaches treat the feature information extracted by the backbone network identically in the final two stages of detection, neglecting the fact that various modalities and different hierarchical features play distinct roles in SOD, resulting in poor detection results. To solve this problem, we propose a transformer-based difference fusion network (TDF-Net) for RGB-D SOD that treats modal features and hierarchical features differently in the feature fusion and feature prediction stages, respectively. First, we adopt the pyramid vision transformer as a feature extractor to obtain hierarchical features from the input RGB images and depth images, respectively. Second, we propose a differential interactive fusion module, in which the RGB modality and the depth modality learn modality-specific features independently, and the two modalities guide each other to fuse features. Finally, we divide the hierarchical features after cross-modal fusion into high-level and low-level features and propose three types of cross-layer fusion modules to discriminately integrate features from different layers to predict the salient maps. Extensive experiments on five benchmark datasets confirm that our proposed TDF-Net outperforms the state-of-the-art methods.

© 2022 SPIE and IS&T
Zhi-Qiang Cui, Feng Wang, and Zheng-Yong Feng "Transformer-based difference fusion network for RGB-D salient object detection," Journal of Electronic Imaging 31(6), 063058 (27 December 2022). https://doi.org/10.1117/1.JEI.31.6.063058
Received: 25 July 2022; Accepted: 15 December 2022; Published: 27 December 2022
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature fusion

RGB color model

Feature extraction

Object detection

Transformers

Image fusion

Semantics

Back to Top