W-MAFormer: W-shaped multi-attention assisted transformer for polyp segmentation

M. Yi; Y. Su; Y. Shen; W. Wang

doi:10.1117/12.3008772

3 April 2024 W-MAFormer: W-shaped multi-attention assisted transformer for polyp segmentation

M. Yi, Y. Su, Y. Shen, W. Wang

Proceedings Volume 12927, Medical Imaging 2024: Computer-Aided Diagnosis; 129270V (2024) https://doi.org/10.1117/12.3008772
Event: SPIE Medical Imaging, 2024, San Diego, California, United States

Abstract

Colorectal cancer, ranked as the third deadliest disease globally, can be effectively prevented through the timely detection and removal of colorectal polyps. Precise diagnosis necessitates the accurate segmentation of these polyps, a task where existing deep learning solutions exhibit limitations. Specifically, many CNN-based models emphasize local information, hence their receptive field is constrained by the kernel size, impairing their effectiveness on larger polyps. Conversely, vision transformer-based models replace the CNN-based encoder with transformers to avail more potent global contextual representations, however, the segmentation still hinges on a CNN-centric decoder. Bridging this gap, we introduce the W-shaped Multi-Attention Assisted Transformer (WMAFormer) for polyp segmentation, which employs transformer modules in lieu of conventional convolutional blocks within the decoder. Structurally, our encoder harnesses the pyramid vision transformer’s capabilities, while our decoder amalgamates three pivotal modules: Reference Feature Extractor (RFE), Semantic Feature Enhancement (SFE), and Reverse Attention Decoder (RAD). Notably, the SFE module employs mutual and dual attention mechanisms to augment shared information across varying scales and channels of feature maps. This enhancement necessitates a robust reference map, a responsibility vested in the RFE. Subsequent to this refinement, the improved feature map is channeled to the RAD, which employs reverse attention operations to yield the final prediction. Throughout this architecture, attention mechanisms remain paramount, safeguarding the preservation of global information. Our comprehensive evaluation spanning five prominent datasets showcases the model’s prowess, with both quantitative and numerical results that commendably outpace several contemporary state-of-the-art semantic segmentation methods.

Conference Presentation

Citation Download Citation

M. Yi, Y. Su, Y. Shen, and W. Wang "W-MAFormer: W-shaped multi-attention assisted transformer for polyp segmentation", Proc. SPIE 12927, Medical Imaging 2024: Computer-Aided Diagnosis, 129270V (3 April 2024); https://doi.org/10.1117/12.3008772

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available