Integrating image-based LLMs on edge-devices for underwater robotics

Prabha Sundaravadivel; Preetha J. Roselyn; Vedachalam Narayanaswamy; Vincent I. Jeyaraj; Aishree Ramesh; Aaditya Khanal

doi:10.1117/12.3014446

7 June 2024 Integrating image-based LLMs on edge-devices for underwater robotics

Prabha Sundaravadivel, Preetha J. Roselyn, Vedachalam Narayanaswamy, Vincent I. Jeyaraj, Aishree Ramesh, Aaditya Khanal

Proceedings Volume 13034, Real-Time Image Processing and Deep Learning 2024; 130340E (2024) https://doi.org/10.1117/12.3014446
Event: SPIE Defense + Commercial Sensing, 2024, National Harbor, Maryland, United States

Abstract

Image-based Large Language Models (LLMs) are AI models that can understand the captured images and generate textual content based on the analysis of images or visual data. Incorporating the LLMs for assessing water quality, pressure, and environmental conditions can help analyze historical data and predict potential risks and threats in underwater environments. This can improve the intervention of autonomous underwater vehicles ( AUV) and remotely operated vehicles ( ROV) during emergencies where the visual data must be interpreted to make informed decisions. While LLMs are primarily associated with processing and generating text, they can be integrated with images through a process known as multimodal learning, where text and images are combined for tasks that involve both modalities. Implementing such frameworks is challenging when deployed in low-power microcontrollers primarily used in monitoring systems. This research proposes evaluating multimodal tokens to enable edge computing in bio-inspired robots to monitor the underwater environment. This can help break down large real-time videos into tokens of text-based instructions associated with the description of images. The mini-robots will transmit the collected “tokens” to the nearest AUV or ROV, where the image-based LLM will be deployed. We propose to evaluate this image-based LLM in our NVIDIA Jetson Nano-based AUV. In the proposed architecture, the mini-robots can move along the length of the water column to capture images of the underwater environment. Our proposed model is evaluated to generate texts for boat and fish images. This proposed framework with integrated image-based tokens can significantly reduce the response time and data traffic in underwater real-time monitoring systems.

Conference Presentation

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Prabha Sundaravadivel, Preetha J. Roselyn, Vedachalam Narayanaswamy, Vincent I. Jeyaraj, Aishree Ramesh, and Aaditya Khanal "Integrating image-based LLMs on edge-devices for underwater robotics", Proc. SPIE 13034, Real-Time Image Processing and Deep Learning 2024, 130340E (7 June 2024); https://doi.org/10.1117/12.3014446

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available