Transformer-based difference fusion network for RGB-D salient object detection

Zhi-Qiang Cui; Feng Wang; Zheng-Yong Feng

doi:10.1117/1.JEI.31.6.063058

27 December 2022 Transformer-based difference fusion network for RGB-D salient object detection

Zhi-Qiang Cui, Feng Wang, Zheng-Yong Feng

Author Affiliations +

Journal of Electronic Imaging, Vol. 31, Issue 6, 063058 (December 2022). https://doi.org/10.1117/1.JEI.31.6.063058

Abstract

RGB-D salient object detection (SOD) can usually be divided into three stages: feature extraction, feature fusion, and feature prediction. Most approaches treat the feature information extracted by the backbone network identically in the final two stages of detection, neglecting the fact that various modalities and different hierarchical features play distinct roles in SOD, resulting in poor detection results. To solve this problem, we propose a transformer-based difference fusion network (TDF-Net) for RGB-D SOD that treats modal features and hierarchical features differently in the feature fusion and feature prediction stages, respectively. First, we adopt the pyramid vision transformer as a feature extractor to obtain hierarchical features from the input RGB images and depth images, respectively. Second, we propose a differential interactive fusion module, in which the RGB modality and the depth modality learn modality-specific features independently, and the two modalities guide each other to fuse features. Finally, we divide the hierarchical features after cross-modal fusion into high-level and low-level features and propose three types of cross-layer fusion modules to discriminately integrate features from different layers to predict the salient maps. Extensive experiments on five benchmark datasets confirm that our proposed TDF-Net outperforms the state-of-the-art methods.

Citation Download Citation

Zhi-Qiang Cui, Feng Wang, and Zheng-Yong Feng "Transformer-based difference fusion network for RGB-D salient object detection," Journal of Electronic Imaging 31(6), 063058 (27 December 2022). https://doi.org/10.1117/1.JEI.31.6.063058

Received: 25 July 2022; Accepted: 15 December 2022; Published: 27 December 2022

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
17 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Feature fusion

RGB color model

Feature extraction

Object detection

Transformers

Image fusion

Semantics

Show All Keywords

Keywords/Phrases

Search In:

Publication Years