Most previous studies on visual adversarial attacks mainly pay close attention to the attack performance but only few of them concerns the appearance of the examples after the generated adversarial perturbation been exerted on. Without enforcing any restrictions over the adversarial perturbation often leads to conspicuous and attention-grabbing patterns in the generated adversarial examples which can be easily identified by humans. In order to address the issue mentioned above, we propose a method to craft the perturbation generated by visual adversarial attack for object recognition through leveraging the post-hoc visual explanation methods for DNNs to generate saliency map️ which are capable of indicating the region of the input image that makes the most important contribution to the model prediction. Through pointing out the region where the adversarial attack should focus on to maximize the impact and confining the scope of adversarial perturbation to be exerted, our method can generate natural looking adversarial examples while maintaining high attack performance. With extensive experiments in which the method proposed in this work is compared to the current state-of-the-art adversarial attack techniques all of which are applied to widely used deep neural networks on standard datasets, the results show that our proposed method produces significantly more realistic and natural looking adversarial examples than several state-of-the-art baselines while achieving competitive attack performance.
Along with the prosperity and development of computer vision technologies, fine-grained visual classification (FGVC) has now become an intriguing research field due to its broad application prospects. The major challenges of fine-grained classification are mainly two-fold: localization of discriminative region and extraction of fine-grained features. The attention mechanism is a common choice for current state-of-art (SOTA) methods in the FGVC that can significantly improve the performance of distinguishing among fine-grained categories. The attention module in different designs is utilized to capture the discriminative region, and region-based feature representation encodes subtle inter-class differences. However, the attention mechanism without proper supervision may not learn to provide informative guidance to the discriminative region, thus could be meaningless in the FGVC tasks that lack part annotations. We propose a weakly-supervised attention mechanism that integrates visual explanation methods to address confusing issues in the discriminative region localization caused by the absence of supervision and avoid labor-intensive bounding box/part annotations in the meanwhile. We employ Score-CAM, a novel post-hoc visual explanation method based on class activation mapping, to provide supervision and constrain the attention module. We conduct extensive experiments and show that the proposed method outperforms the current SOTA methods in three fine-grained classification tasks on CUB Birds, FGVC Aircraft, and Stanford Cars.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.