Analyzing a human-in-the-loop's decisions for the detection of data poisoning

Samantha S. Carley; Stanton R. Price

doi:10.1117/12.2586260

12 April 2021 Analyzing a human-in-the-loop's decisions for the detection of data poisoning

Samantha S. Carley, Stanton R. Price

Proceedings Volume 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III; 1174616 (2021) https://doi.org/10.1117/12.2586260
Event: SPIE Defense + Commercial Sensing, 2021, Online Only

Abstract

Human-in-the-loop (HITL) is the process of combining the power of a machine or computer system and human intelligence to develop human-aware machine learning (ML) models. The HITL process creates a continuous feedback loop between human and machine, enabling the trained model to continuously improve as edge cases present themselves without the need to fine-tune the model from scratch. Several advantages of utilizing HITL ML systems are to avoid bias, ensure consistency and accuracy, improve efficiency, and provide transparency. However, adding human involvement also invites human mistakes. Occasionally, a HITL system may actually degrade the algorithm rather than improve it: mislabeling an object in an object detection algorithm, incorrectly scoring an algorithm’s output, making misclicks and typos, and other human errors cause the HITL to make a mistake based on the facts presented to it. These errors being made by the user, intentional or not, can be considered a form of data poisoning. To understand the effects of a HITL’s choices on an ML model, several pieces of information during the HITL process can be observed, i.e.,: the time taken by the user to provide input on an output or on a specific object class, as well as an evaluation of the consistency of submitted valid inputs, among other factors. Information extracted from the HITL’s decision-making process can provide insights into whether poor choices are being made by the user (i.e., data poisoning) and identify where, when, and why these choices are being made. Many state-of-the-art models can be utilized for this work, such as ResNet-50, DarkNet-53, Xception, among others. However, for this work, we are less focused on the model being used and more focused on the procedure for tracking HITL performance to maximize model improvement. Nevertheless, this work will consider a pretrained model, though the approach will be model agnostic. The dataset used in this research is the publicly available “Flowers Recognition” dataset available on Kaggle.¹

Conference Presentation

Citation Download Citation

Samantha S. Carley and Stanton R. Price "Analyzing a human-in-the-loop's decisions for the detection of data poisoning", Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 1174616 (12 April 2021); https://doi.org/10.1117/12.2586260

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available