PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323001 (2024) https://doi.org/10.1117/12.3043879
This PDF file contains the front matter associated with SPIE Proceedings Volume 13230, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer Information Classification and Machine Vision Application
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323002 (2024) https://doi.org/10.1117/12.3035613
Due to the complex working environment of the pipeline, the collected pipeline signals contain a lot of noise signals, which seriously affects the extraction of pipeline signals. A denoising method based on improved INGO-VMD joint energy entropy in this paper. First of all, The VMD parameters were optimized by the improved Northern Goshawk algorithm , then a variational mode decomposition is carried out on the noisy signal. Finally, the boundary line between noise and signal is determined according to energy entropy, and the signal is reconstructed to the denoised signal is obtained. The results can see the proposed method has the highest signal-to-noise ratio (SNR) of 24.22dB. The RMSE error is the lowest, 0.079. It is proved that the method not only can filter noise signal, but also has better noise reduction effect and stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323003 (2024) https://doi.org/10.1117/12.3035454
Cerebral aneurysms are highly dangerous diseases, with their rupture leading to severe disabilities, thus necessitating accurate prediction of rupture risk. In this study, we constructed a custom dataset containing 178 aneurysms and extracted four types of features for fusion, including deep learning, clinical, morphological, and hemodynamic features. We then built five distinct machine learning models based on these selected features to analyze and predict aneurysm rupture. Specifically, the Random Forest model demonstrated the best performance in estimating the likelihood of aneurysm rupture, achieving a ROC value of 0.885. Our findings highlight that leveraging multimodal data enhances the accuracy of predicting aneurysm rupture risk, which could improve early detection and inform better clinical decision-making.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323004 (2024) https://doi.org/10.1117/12.3035779
Aspect-based sentiment analysis (ABSA) is a crucial granular task within sentiment analysis, focusing on the precise identification of sentiment orientations for specific aspects within text. Recognizing that identical context words can express opposing sentiment polarities in different situations, it's essential to delve into the nuanced interactions between target and context words. This study introduces an RCG-based Hybrid Attention Network, a novel architecture that adeptly utilizes lexical attention mechanisms to extract lexical features and fortify the relationship between aspects and their corresponding target words. To assess the efficacy of our proposed approach, we conducted experiments on a well-known public dataset. The results show a significant 3.37% enhancement in accuracy and a 1.38% improvement in Macro-F1 scores compared to related methods, affirming the superiority of our technique in enhancing the performance of aspect-level sentiment analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lirong Huang, Zhongjie Zhou, Sijie Xu, Junting Ou, Hongbo Sun
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323005 (2024) https://doi.org/10.1117/12.3035492
In recent years, with the improvement of living standards and economic development, urban car ownership has increased year by year. This has put pressure on the carrying capacity of roads, leading to traffic congestion. However, for the traditional means of vehicle detection, there are problems such as small detection range, poor flexibility and low efficiency, so that it is not possible to obtain real-time and convenient traffic flow data. In this paper, YOLOv5-OBB is used as a research object, and three experiments are conducted using the DroneVehicle vehicle dataset with the addition of the BiFormer mechanism to strengthen the model's ability to recognise small targets and NAM Attention to improve the model's convergence speed. We observed that the HBBmAP of YOLOv5-OBB increased to 74.1% and HBBmAP@0.5:0.95 increased to 49.7% after adding the BiFormer mechanism, while the model stability after adding the NAM Attention mechanism was outstanding. This study provides a practical reference for the development of intelligent transport.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323006 (2024) https://doi.org/10.1117/12.3035585
Image-to-point cloud registration is the task of fusing 2D images with 3D point clouds, typically combining the rich texture information from images with the spatial information from point clouds for a more comprehensive scene representation. However, the feature discrepancy between different modalities often makes it challenging to accurately match. Addressing this issue, this paper proposes an image-matching-based algorithm for image-to-point cloud registration. Initially, for cross-modal issues, we use 3D visualization tools to screenshot the 3D models, reading the camera information of the screenshots to establish a correspondence between the 3D models and 2D images. By employing a pretrained image matching model, the algorithm extracts keypoints and matches images of the screenshots and images, establishing image keypoints pairing relationships. Then, using the camera projection model, it maps the 3D coordinates to image keypoints, obtaining image-to-point cloud keypoint pairing relationships and calculating the transformation matrix between the image and 3D point cloud using PnP-RANSAC. The proposed method, compared with the manual annotations method, proves to significantly improve efficiency and stability in real-world scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323007 (2024) https://doi.org/10.1117/12.3035463
Unmanned Aerial Vehicle (UAV) technology plays a pivotal role in military reconnaissance and battlefield surveillance. With the rapid advancement of deep learning, the integration of deep learning with UAV technology has become increasingly significant. Addressing the current shortfall in autonomous recognition capabilities of UAV technology in complex environments, this paper proposes an intelligent recognition and tracking algorithm for UAVs based on battlefield conditions. Building upon a specially curated dataset for friend-or-foe target recognition under various environments, this study combines the YOLO algorithm with PID control methods to develop a UAV control system capable of classifying target behaviors and achieving real-time tracking and ranging during dynamic flight. Notably, the friend-or-foe target recognition dataset is enhanced with multi-dimensional labels for friend-or-foe identification, enemy situation analysis, gesture control, and formation behaviors, providing a rich set of training data. Experimental results demonstrate that the target recognition classification achieved an accuracy of 91.2%, with a response time within 0.2 seconds, thereby confirming the algorithm's effectiveness and robustness under diverse battlefield conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323008 (2024) https://doi.org/10.1117/12.3035729
As the use of drones in emergency response, traffic monitoring, and logistics continues to expand, public concerns over privacy, security, and environmental intrusion have become significant barriers to societal acceptance of drones. This study, through empirical analysis, elucidates the interplay between drone color and light design with environmental backgrounds in enhancing detection capabilities and reducing perceived risks. Twenty participants took part in a visual search experiment to assess reaction times to drone stimuli across urban, greenery, clear sky, and nighttime scenarios. Results indicate a marked preference for brightly colored drones and specific light patterns in improving search efficiency and visibility. These findings underscore the importance of design in enhancing the perceptibility of drones and advancing privacy and security protections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323009 (2024) https://doi.org/10.1117/12.3035430
In the ever-evolving landscape of computer technology and artificial intelligence, profound transformations have reverberated across the realm of art and design. This scholarly endeavor endeavors to delve into the harmonious fusion of Wudang Mountain’s venerable cultural heritage with contemporary digital art illustrations and cultural creative product designs. By meticulously scrutinizing the traditional culture of Wudang Mountain, we distill its unique cultural idiosyncrasies and artistic nuances. Employing sophisticated style transfer algorithms, we seamlessly transpose the distinctive cultural symbols and graphic elements of Wudang Mountain onto reimagined decorative motifs and visual compositions. In this symbiotic union, the essence of traditional culture is meticulously preserved, while innovative facets of modern design seamlessly interlace. This confluence not only enriches the semantic depth of digital art illustrations but also engenders a plethora of novel prospects and inspirations for cultural and creative design within scenic locales. Ultimately, this harmonious marriage of cultural innovation and technological prowess contributes to the perpetuation and elevation of Wudang Mountain’s time-honored cultural legacy, concurrently propelling the evolution of artistic expression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300A (2024) https://doi.org/10.1117/12.3035904
This paper focuses on the detection and identification of defects on the end faces of small motor bearings. Bearing defects significantly impact the performance and lifespan of motors, and traditional manual inspection methods suffer from low efficiency and high miss rates. This paper proposes the use of machine vision and image processing techniques to enhance the speed, efficiency, and reliability of detection. The study includes the analysis of common defect types on bearing end faces, the selection of industrial cameras and vision lighting, as well as image preprocessing and edge detection. By improving the YOLOv5 algorithm, the detection performance for small and medium-sized targets on bearing end faces is optimized. Moreover, a supervisory software for bearing defect detection has been designed to achieve more effective processing and statistical analysis of detection results. The primary goal of the research is to enhance the accuracy and automation level of detection, reduce the risk of omissions in manual inspections, and improve detection efficiency through the advancement of deep learning-based target detection algorithms. This research holds significant importance for the technological progress and production efficiency improvement in the small motor bearing industry.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300B (2024) https://doi.org/10.1117/12.3035730
In the realms of visual arts and psychology, it's acknowledged that lines and their properties significantly impact human emotional responses. Historically, various characteristics of lines have been proposed for emotional expression research, yet, the pivotal technique in computer vision, Scale-Invariant Feature Transform (SIFT), has frequently been disregarded. computing SIFT key points and their orientations on lines unveils the morphological traits of lines and deciphers their connection with specific emotional states. This method is not only concise in computation but also offers intuitive visualizations and explanations. Our investigation delves into the crucial role of SIFT features in conveying emotions from a computational vision standpoint. By scrutinizing SIFT features extracted from hand-drawn line images, we examined the trends in the quantity and orientation of feature points, and vividly illustrated how these points influence line morphology. Logistic regression analysis has further exposed the linkage between feature points and emotional dimensions such as pleasure and arousal. The results underscore the significance of SIFT features in emotional expression, facilitating the advancement of affective computing applications, as well as emotion recognition and analysis grounded in line features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Honghua Zhao, Kedun Zhao, Bo Chen, Zhaoguo Sun, Xuan Sun, Changsheng Ai
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300C (2024) https://doi.org/10.1117/12.3035627
Aiming at the transport of biological samples under low temperature environment, a sample transport robot based on visual SLAM navigation was designed to complete the transport of sample tubes under low temperature cold storage environment. SLAM uses RGB-D SLAM three-thread scheme to complete real-time localization and mapping. The algorithm is improved considering that the actual environment is static stability and sufficient illumination, and the front-end odometer is prone to mismatching and the data association time. An improved "FAST-BRIEF-COUPLING" loosely coupled data association strategy is proposed to provide more response time for back-end optimization threads and global pose map optimization. Through the simulation environment experiment, the real-time and robust performance of the algorithm can meet the needs of practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300D (2024) https://doi.org/10.1117/12.3035827
With the increasing demands for real-time accuracy in visual SLAM (Simultaneous Localization and Mapping), the critical front-end component of SLAM systems has become the extraction and matching of image features to estimate camera motion pose. Existing feature extraction methods are susceptible to local feature point density and even overlap due to uncertainties in lighting intensity and surrounding environments, leading to uneven distribution of overall feature points. This, in turn, affects the accuracy of feature matching, thereby impacting the performance of the SLAM process. To address these challenges, this paper proposes a novel ORB image feature extraction algorithm based on fuzzy control. Initially, a Gaussian pyramid of the image is constructed, and the Intensity Centroid is calculated to ensure scale and rotation invariance. Subsequently, candidate feature points are extracted using ORB at each pyramid level, then the candidate feature points are selectively filtered through fuzzy control based on the number of candidate feature points and the magnitude of their response values. Finally, a BRIEF descriptor is established for feature matching. The proposed Fuzzy-Controlled ORB Feature Extraction (OFFC) algorithm was experimentally validated on open-source TUM and Mikolajczyk datasets. Results indicate that OFFC successfully avoids excessive density and overlap of extracted feature points, reduces mismatches in image feature points, and enhances the accuracy of feature matching. Notably, OFFC demonstrates adaptability and maintains robust performance in three challenging scenarios: image blurring, varying brightness, and dense similarity of image blocks, highlighting its significant practical value for visual SLAM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300E (2024) https://doi.org/10.1117/12.3035626
Based on an open-source image set, this study calculated the relative positional relationships between the eyes and the bregma and lambda points, using the nose tip as a reference point. We found that the correlation coefficients were as high as 0.942 and 0.935, respectively. Leveraging these findings, we fitted linear regression equations that can estimate the locations of the bregma and lambda points based on eye positions, with theoretical errors of 0.35mm and 0.57mm, respectively. Additionally, an automatic recognition algorithm for the nose tip and eyes was designed using openCV and YOLOV5, enabling automatic localization of the bregma and lambda points. Upon validation, the actual recognition errors of this method were 0.41mm and 0.66mm, respectively, demonstrating high precision. This noninvasive and straightforward automatic recognition approach not only effectively reduces surgical trauma to animals but also avoids the precision errors associated with manual positioning, offering new possibilities for the clinical application of brain-computer interface technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300F (2024) https://doi.org/10.1117/12.3035665
Attention mechanisms have shown impressive abilities in solving downstream multi-modal tasks. However, there exists a natural semantic gap between vision and language modalities that hinders conventional attention-based models in achieving effective cross-modal semantic alignment. In this paper, we present JoAt, a Joint Attention net, through which we investigate how to utilize the visual background information more directly in a query-adaptive manner to enrich querying semantics for each visual token, and how to more fully bridge the semantic gap to achieve cross-modal alignment between visual-grid and textual features. Specifically, our JoAt utilizes each query’s neighboring pixels, aggregates the visual query tokens from different receptive fields, and allows the model to dynamically select the most relevant neighboring tokens for each query, then obtains representations that are more semantically matched with the textual features to realize better interaction between visual and linguistic modalities. The experimental results show that our JoAt net can fully utilize different semantic-level signals from visual features at different receptive fields and effectively narrow the natural semantic differences between visual and language modalities. Our JoAt achieved an accuracy of 72.15% and 98.90% on the VQAv2.0 test-std and CLEVR benchmarks, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300G (2024) https://doi.org/10.1117/12.3035520
Vision Transformer (ViT) fully demonstrates the potential of the transformer architecture in the field of computer vision. However, the computational complexity is proportional to the length of the input sequence, thus limiting the application of transformers to high-resolution images. In order to improve the overall performance of Vision Transformer, this paper proposes an efficient Vision Transformer (MLVT) with dynamic embedding of multi-scale features, adopting the pyramid architecture, replacing the self-attention operation with linear self-attention, proposing a local attention enhancement module to address the problem of the dispersal of linear self-attention scores that ignores local correlation, and supplementing the local attention with the convolution of the self-attention-like computation. operation of self-attention-like computation is utilized to supplement the local attention. Aiming at the increase of feature dimension in pyramid architecture, the bottleneck of linear self-attention computation is changed from sequence length to feature dimension, and the linear self-attention with compressed feature dimension is proposed. In addition, since multi-scale inputs are crucial for processing image information, this paper proposes a flexible and learnable dynamic multi-scale feature embedding module, which dynamically adjusts the weights of different scale features according to the input image for fusion. A large number of experiments on image classification and target detection tasks show that competitive results are achieved while reducing the computational effort.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300H (2024) https://doi.org/10.1117/12.3035731
Generating Three-Dimensional Animation of Virtual Characters Based on Monocular RGB Video represents a cutting-edge technological application in the field of computer graphics. Initially, monocular video serves as input data, from which the three-dimensional coordinate information of characters is extracted. Subsequently, this data is mapped onto any three-dimensional character model, enabling the generation of three-dimensional animation corresponding to any character action portrayed in the video. This paper aims to generate three-dimensional animation of virtual characters, leveraging frame interpolation and deep learning methods for implementation. The innovation of this paper primarily lies in the extraction of key frames from the video using frame interpolation. Furthermore, interstitial frame character action data between each key frame is predicted and generated through a neural network model. The resulting three-dimensional animation of characters demonstrates high quality and smoothness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300I (2024) https://doi.org/10.1117/12.3035755
Acquiring high-quality 3D information from transparent materials, which exhibit weak light scattering and absorption characteristics, presents a formidable challenge. In this paper, we introduce a novel three-dimensional reconstruction method called PMP-LIF (Phase Measurement Profilometry with Laser-Induced Fluorescence). This approach uses machine vision and incorporates structured light technology with laser-induced fluorescence to enhance the reliability and robustness of surface profile data for transparent objects by improving their reflectivity. In our study, we accomplished a three-dimensional reconstruction of surface profiles for transparent materials by employing a structured light pattern with a 544 nm wavelength, projected using a DLP (Digital Light Processing) projector. The image data was captured using a binocular vision system. Any distorted information within the image data was processed using image processing techniques, such as a phase unwrapping algorithm. The adoption of the PMP-LIF method for image data acquisition streamlines the complexity of the image acquisition system while preserving the accuracy of surface profile information. We conducted three-dimensional reconstruction tests on various objects, including human face shapes, on the surfaces of static liquids and transparent aspheric mirror blanks. The results reveal root-mean-square errors of 0.34 mm and 1.67 mm for the registration of the reconstructed point cloud information with fitting planes, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300J (2024) https://doi.org/10.1117/12.3035725
Medical image segmentation is a challenging and important task that aims to identify and separate different anatomical structures or pathological regions from complex and noisy image data. However, most existing deep learning models for medical image segmentation are based on convolutional neural networks (CNNs), which have high memory consumption and limited spatial reasoning capabilities. In this paper, we propose a novel deep learning model for medical image segmentation based on Swin UNET, which combines the self-attention mechanism of Swin Transformer and the encoder-decoder architecture of U-Net. We also propose a memory management strategy that optimizes the number of heads of the multi-head self-attention mechanism using probabilistic mirror flipping and grid search. We conduct extensive experiments on a challenging medical image segmentation dataset and demonstrate that our model and strategy achieve comparable or better accuracy than the state-of-the-art models while significantly reducing the memory usage. Our model and strategy are robust and generalizable, as they can handle arbitrary input resolutions, scales, and modalities, and achieve state-of-the-art performance on a challenging medical image segmentation dataset. Our study contributes to the advancement of the research field of medical image segmentation, and provides a practical and scalable solution for real-world application scenarios with limited resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300K (2024) https://doi.org/10.1117/12.3035612
This paper proposes a visual recognition and localization method based on color classification by K-means and OTSU. Based on the characteristics of HSV color space which is convenient for color segmentation, after transferring the image from RGB color space to HSV color space, the K-means algorithm is combined with the difference method to segment the targets with the same color characteristics. For the targets with the same color characteristics, the OTSU algorithm is considered to segment the top surface of the target in the S-channel of the image, and the center of the top surface of the target is calculated using the image processing method as the suction point of the robotic handling. The experimental results show that the method is characterized by high accuracy, robustness and real-time performance in experiments such as target recognition based on color classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jianxun Kuang, Peiqi Shen, Hao Zeng, Zhoulong Yuan, Jun Chen
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300L (2024) https://doi.org/10.1117/12.3035572
Using ultra-low frequency mixed shock wave breakdown tests, identify the damaged area of the cable, which is susceptible to changes in surface temperature field, resulting in significant deviation between the damage detection position and the actual position. Therefore, it is necessary to design a new method for detecting surface damage in submarine cable laying based on computer vision and pixel enhancement. Using the principle of pixel enhancement to extract and identify damage detection features of submarine cable laying surfaces, thereby completing damage detection of submarine cable laying surfaces. The experimental results show that the damage area and stress amplitude are consistent with the actual height, and have good detection effect, reliability, and certain application value, making a certain contribution to the subsequent maintenance work of submarine cables.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300M (2024) https://doi.org/10.1117/12.3035809
Under interference environments such as strong arc light, splashing, smoke and dust, and strong noise, the accuracy and stability of weld trajectory tracking control are important guarantees for achieving high-quality welding. After preprocessing the weld seam images obtained from laser vision, curve fitting is used to connect them into lines and extract the center feature points of the light strip during the welding process. The feature point coordinates are converted and the feature point line is used as the reference trajectory of the weld seam. In order to solve the planning problems encountered in the welding seam tracking process, the optimal transverse and longitudinal movement speed of the welding gun head is obtained based on the predictive control model. The obtained parameters are used as the input of the welding system and transferred to the execution mechanism to achieve more accurate tracking of the welding seam trajectory.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300N (2024) https://doi.org/10.1117/12.3035544
Acoustic imaging technology is one of the cutting-edge technologies in the field of acoustic testing. There is a very broad market for applying acoustic imaging technology to gas leakage detection, but there are also situations where the detection distance is insufficient and the detection accuracy is insufficient, making it difficult to apply to scenarios with more stringent detection requirements. To improve the application ability of this technology in the detection field, in this paper, some technical improvement measures are described, the design idea of the Microphone array is mainly studied, and some progress in high sampling rate and detection frequency has been made, combined with equipment miniaturization design and comprehensive improvement of a software algorithm. Finally, according to actual measurement data, these technological upgrades have greatly improved the detection range and sensitivity of acoustic imaging technology in the detection process, making it particularly suitable for routine leak detection in large chemical plants.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300O (2024) https://doi.org/10.1117/12.3036522
This paper intercepts the Web comment data in August 2015, and first analyzes the topics of these comment data through the LDA model, from high to low, which are 'Technology', 'Internet', 'Entertainment', 'Daily life', 'Daily study', which shows that people pay great attention to the topic of science and technology, which is in line with the theme that science and technology are the primary productive forces in today's era. Secondly, the word frequency map and text sentiment analysis are used as supplementary analysis, first through the word frequency map, the high-frequency words in these data are analyzed, and the top three high-frequency words are 'share', 'diamond' and 'China', which benefits from the rapid development of the Internet, so that people can share on the Internet daily, follow dramas and discuss China's development. Then, secondly, the positive sentiment words and negative sentiment words of this data are analyzed through text sentiment analysis to obtain which words are related to positive sentiment content, including 'diamond', 'China', 'data' and so on, which words are related to negative sentiment content, including 'share', 'president', 'phone' and so In summary, when dividing topics through the LDA model, the text data can also be supplemented by drawing word frequency maps and text sentiment analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300P (2024) https://doi.org/10.1117/12.3036564
In the rapid development of technology, remote sensing image acquisition technology has been widely applied in many fields. However, accurately extracting building information from massive remote sensing data in complex scenes remains a huge challenge. This article proposes a remote sensing building extraction method for complex scenes based Transformer. Using transfer learning, the Google pre trained ViT models weight model is used as the pre training model for the automatic building extraction algorithm model in this article. Based on this pre training model, a remote sensing dataset is trained. With the help of a model combining Transformer and U-Net tasks, the prepared dataset and network structure are adjusted and improved to adapt to the characteristics of remote sensing image samples. The experimental results show that TransUNet can better segment and preserve detailed shape information, enjoying both the benefits of high-dimensional global contextual information and the benefits of low dimensional details. This paper uses TransUNet to achieve the first application of Transformer in building extraction from remote sensing images. It not only encodes the global and contextual information of the image as a sequence, but also effectively utilizes the low dimensional features of CNNs through a U-shaped hybrid structure design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300Q (2024) https://doi.org/10.1117/12.3036317
Multimodal sentiment analysis is a method that combines multiple modal information such as text, image and audio to analyze and understand people's emotional states and emotional expressions. This method can capture the sentiment more comprehensively and improve the accuracy and effect of sentiment analysis. However, most of the previous studies focus on the fusion mode between modalities, ignoring the emotional features beyond the text information, and fail to fully mine the semantic emotional information contained in it. To solve the above problems, an Attention Fusion Network with Crossmodal Emotion Enhancement (AFNCEE) is proposed. Firstly, Long Short-Term Memory (LSTM) was used to obtain contextual semantic information from a single modality, and the cross-modal Transformer stacked structure was used to fuse text, audio and visual modal features to enhance the hierarchical depth of fusion. Then, the SenticNet knowledge base is used to construct a text sentiment knowledge graph to enhance its additional representation. Finally, a feature-based attention fusion module is designed to dynamically adjust the additional representation and the weight of each modal representation, so as to realize multi-modal fusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Weilin Li, Yangming Jiang, Chunjie Wu, Xiaoge Wang, Min Xu
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300R (2024) https://doi.org/10.1117/12.3036316
Nondestructive evaluation of metal components using ultrasound is a widely adopted technique. Nevertheless, in practical applications, various factors may influence the outcomes of the testing. This study thoroughly examines the influencing factors of the ultrasonic testing process and employs deep learning techniques to explore how these factors affect the ultrasonic signals and the final testing results. Subsequently, a numerical simulation case was developed in COMSOL 6.2 to investigate how workpiece surface roughness, coupling materials, probe pressure affects the maximum amplitude of the ultrasonic signal, as well as their influence on the final testing results, thereby validating the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300S (2024) https://doi.org/10.1117/12.3036336
This study aims to enhance the performance and applicability of soccer robots in semantic segmentation tasks. Initially, we prepared the data and selected three models, DenseNet121, ResNet50, and Mobilenetv2, for training. During the training process, we employed data augmentation methods such as image rotation, horizontal flipping, and cropping, and experimented with various loss functions and optimizers. The experimental results showed that the combination of Mean Squared Error (MSE) as the loss function and Adam as the optimizer performed best. Additionally, we explored the addition of attention mechanisms and freezing layers, and introduced a dynamic learning rate strategy. In terms of model selection, MobileNetv2 was considered the optimal model due to its high validation accuracy, reasonable training time, and resource usage. In the final stage, we trained the final model based on the excellent parameters and models summarized earlier, achieving a validation accuracy of 94.31% in the last epoch. Overall, our research provides an effective strategy to optimize the semantic segmentation tasks of soccer robots.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300T (2024) https://doi.org/10.1117/12.3036486
A UAV recognition algorithm developed using improved YOLOV8 (SLD-YOLOV8n) is proposed in this paper, To enhance the ability of multi-scale feature extraction, the proposed algorithm utilizes the LSKA attention mechanism to improve the structure of the SPPF; replaces the conv in the backbone with the SPD Conv, which increases the number of channels to save more image feature information; To tackle the disparity between simple and difficult samples, the proposed algorithm utilizes the SLideLoss classification loss function; in the detection head part, the Detect_DyHead detection head is used to improve the detection ability of the basic for different sized targets and to improve the robustness of the detection. The SDY-YOLOV8n technique has a better performance on the DIOR dataset , which improves 1.9 percentage points in mAP50 accuracy over the original YOLOv8n algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300U (2024) https://doi.org/10.1117/12.3036479
This paper mainly addresses the problem of acoustic domain adaptation of speech recognition models in smart home scenarios, and for the interference problem of home noise on speech recognition, this paper proposes a symmetric structure of Teacher-Student training paradigm, specifically by allowing the output of the Student model trained on the background noise dataset of the quiet audio mixing to be close to the output of the quiet audio dataset trained on the Teacher model, thus achieving the purpose of domain adaptation for home scenarios. The form of Teacher model's output implicitly allows the noise scene recognition results to converge to the recognition results of the quiet scene, so as to achieve the purpose of domain adaptation to the home scene, so that it can show higher accuracy in the home acoustic environment. In this paper, we use the Mandarin speech dataset mixed with the noise dataset to construct the background noise data of the home scene, and the comparison experiments in the noisy environment show that the word error rate of the Conformer model obtained from the training of the proposed Teacher-Student model decreases by 6.21% compared with that of the Conformer baseline model in the noisy environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300V (2024) https://doi.org/10.1117/12.3036526
Flame 3D reconstruction is an important research direction in the field of combustion diagnosis, which has significant value for studying and understanding combustion laws and promoting the practical application of combustion phenomena. In the field of flame 3D reconstruction, traditional algebraic solving methods may gradually accumulate errors during the solving process, while the universality of current deep learning reconstruction methods still needs to be improved. This article designs FMAR network based on multi-angle channels architecture for three-dimensional reconstruction of flame temperature and soot volume fraction. The experimental results show that FMAR network has achieved the effect of releasing input angle constraint and enhancing the universality, while still maintaining the advantage of high accuracy of deep learning methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic Identification and Intelligent Detection Technology
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300W (2024) https://doi.org/10.1117/12.3035586
Traditional methods for named entity recognition in the construction of a drilling risk knowledge graph face challenges regarding feature extraction accuracy and recognition efficiency. To overcome these issues, a research method based on the RoBERTa-BiLSTM-CRF model is proposed. This method utilizes RoBERTa for word embeddings and applies a BiLSTM network for contextual feature extraction. The Conditional Random Field (CRF) is used for sequence labeling, resulting in a named entity recognition framework for the drilling risk domain. Comparative experiments were conducted on a self-built dataset, comparing the RoBERTa-BiLSTM-CRF model with RoBERTa-BiLSTM and BERT-BiLSTMCRF. The results demonstrate that the RoBERTa-BiLSTM-CRF model achieves superior precision, recall, and F1-score of 89.7%, 89.3%, and 89.1%, outperforming the other models in terms of entity recognition performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yongdong Zhang, Ziwei Wang, Hu Ye, Taiqin Huang, Xiaolong Wu, Jiaqi Zhai
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300X (2024) https://doi.org/10.1117/12.3035516
Volume is the key representation of object geometry and shape. In engineering practice, it is frequently necessary to measure the volume of large objects. The volume of such objects is usually more than a few hundreds and even tens of thousands of cubic meters. The traditional method of manual measurement and estimation using shape segmentation not only has huge engineering amount, but also has low efficiency. In recent years, the measurement technology based on LiDAR and binocular vision camera has become more and more mature. Using LiDAR to measure the volume of large objects has become one of the feasible methods. Aiming at the large ellipsoidal structures often encountered in engineering practice, this paper presents a method of object volume measurement based on LiDAR. This method can measure the volume of objects over 10000 of cubic meters. At the same time, the feasibility of the method is proved by applying the method to engineering practice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300Y (2024) https://doi.org/10.1117/12.3035573
This paper explores the application of unmanned aerial vehicle (UAV) remote sensing technology in the monitoring of downhill debris flow during the construction of pumped storage power stations. The composition of UAV remote sensing systems, data acquisition and processing procedures are introduced, and their advantages and limitations are analyzed. In response to the shortcomings of traditional monitoring methods, a debris extraction method based on UAV remote sensing imagery is proposed. Experimental validation demonstrates the excellent performance of this method in terms of debris area similarity and overlap, confirming the feasibility and effectiveness of UAV remote sensing technology in downhill debris flow monitoring. Although there is room for improvement in accuracy, with further optimization, UAV remote sensing is expected to become an important technical tool for downhill debris flow monitoring, providing support for the safety management of pumped storage power station construction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132300Z (2024) https://doi.org/10.1117/12.3035437
Aluminum alloy sheets inevitably undergo deformation during processing such as rolling and welding. Plate deformation can alter its mechanical properties, reduce product quality, pose safety hazards, and have a significant impact on subsequent applications, processing, and manufacturing. To avoid the impact of board deformation on production and daily life, it is necessary to test the deformation of the board. It is designed to conduct Tensile testing within the elastic deformation range of 6061 aluminum alloy plate through Tensile testing machine to make it deform. A deformation measurement system consisting of an industrial camera, a high-performance computer, and an auxiliary light source was built. The strain distribution of 6061 aluminum alloy sheet was measured by digital image correlation method. The test results indicate that the maximum tensile strain and maximum compressive strain in the horizontal and vertical directions basically occur near the edge of the calculation sub region. The maximum tensile strain in the horizontal direction is 0.038872, and the maximum compressive strain is 0.026501; The maximum tensile strain in the vertical direction is 0.012119, and the maximum compressive strain is 0.0098305. The design adopts the principle of digital image correlation method for sheet deformation testing, which can accurately, quickly, and non-destructive measure the strain size and distribution of the sheet, making up for the limitations of traditional measurement methods and improving measurement efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323010 (2024) https://doi.org/10.1117/12.3035448
As one of the main components of the belt conveyor, conveyor belt often works in relatively harsh environment and is prone to longitudinal tearing failures. Therefore, adequate and timely detection of tearing failures is crucial is necessary. In order to improve the accuracy and efficiency of traditional conveyor belt tear detection methods, a real-time conveyor belt tear detection algorithm based on the improved YOLOv5s network is proposed. Introduced multi head channel attention (MCA) attention mechanism into the original model to enhance the ability to model channel relationships between feature maps; this article also improves new loss functions, namely Alpha-iou Loss and Scale-Invariant Overlap Union Loss (SIOULOSS), which can better handle the problems including sample imbalance and long-tail distribution in target detection, thereby further improving the performance of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chang'an Hu, Chengyi Zhou, Xingxing Li, Li Li, Fei Lv, Baoquan Hu
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323011 (2024) https://doi.org/10.1117/12.3035491
The indoor long length standard device is an important large-size measuring standard device, which has an acceleration and deceleration process during the starting and stopping stages of the slide. Studying the acceleration of the indoor long length standard device is of great physical significance for understanding its operating rules. In the article, a laser tracker was used to study the acceleration of indoor large length standards. Through data analysis using a laser tracker at equal sampling frequencies of 50Hz and 100Hz, some conclusions were drawn as follows: the acceleration value significantly increases when the measurement frequency of the laser tracker is increased. The maximum acceleration at 50Hz is 579.75 mm/s2, and the maximum acceleration at 100Hz is 920 mm/s2. After the acceleration phase is completed, the sliding table enters a constant speed phase, and the data oscillates in a zigzag pattern with zero values. After the deceleration phase is completed, the acceleration value of the sliding table steadily approaches zero.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323012 (2024) https://doi.org/10.1117/12.3035442
In view of the low detection accuracy and large model parameters of current general object detection algorithms for surface defects in the annular components of wind turbine rotor housings, this paper proposes an improved surface defect detection algorithm for rotor housings based on EfficientDet. The main improvements include replacing the Swish activation function and BN layer in the MBConv module with Mish activation function and GN layer; adding dilated convolution modules to enhance the feature extraction capabilities of the backbone network; redesigning the BiFPN network by increasing cross-level and top-down information connections to enhance the feature fusion capabilities of the BiFPN network, and improve the ability to identify small defects on the surface of rotor housing components. Experimental data shows that the improved EfficientDet network model achieves a mean average precision (mAP) value of 77.69% on the NEU-DET dataset, an improvement of 4.83% compared to the original network model, with the smallest model size, meeting real-time detection requirements of steel surface defects in practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323013 (2024) https://doi.org/10.1117/12.3035443
In the process of detecting defects on photovoltaic floats, issues such as missed detection and false alarms are prone to occur due to the complex texture background and varying defect scales. In this study, a method for surface defect detection of photovoltaic floats based on improved RT-DETR is proposed. Firstly, ResNet50 is employed to replace the original backbone network, making the overall model more lightweight. Secondly, the SimAM attention mechanism is introduced to mitigate the influence of complex backgrounds on the detection results, enhancing the model's perception of target defect features. Finally, to address the weak perception of small targets by RT-DETR, the NWD loss function is incorporated. Experimental results show that the proposed algorithm achieves mAP@0.5, F1 score, parameter quantity, and FPS of 95.7%, 94, 40.9, and 25.8, respectively, compared to the original RT-DETR algorithm. This represents an improvement of 1.6% in mAP@0.5, 2.1% in F1 score, and 7.8 in FPS, while reducing the parameter quantity by 26.1m. The results demonstrate that the proposed method meets the requirements for defect detection on photovoltaic floats.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323014 (2024) https://doi.org/10.1117/12.3035761
This paper mainly realizes the crack contour extraction function of bridge surface crack detection system, and constructs the crack semantic segmentation model by deep learning algorithm to realize pixel level binary segmentation of cracks. Based on the Unet model, we added attention mechanism to the skip connection part and constructed a joint loss function in the loss function part. In the set experimental environment, we conducted a comparison experiment with the commonly used semantic segmentation model. The experimental results demonstrate a significant enhancement in segmentation performance achieved by the improved model. The overall accuracy reached 0.9015, recall reached 0.7199, and the F1 score reached 0.9225. All indicators are higher than other models, and the previous problems have been greatly improved. Therefore, this model can be used as the image segmentation model of the crack detection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323015 (2024) https://doi.org/10.1117/12.3035628
In nature, plants are an essential resource that are crucial to preserving the ecosystem's equilibrium. Crop safety is significantly impacted by plat diseases. This paper proposes an automated technique for plant disease diagnosis and classification using machine learning. The created processing plan is divided into four primary sections. First, we use spatially kernelized fuzzy C-means (SKFCM) to segment the images and find the pixels that are primarily green in color. After initially applying a masking process to green pixels using specific threshold values calculated with Otsu's method, a significant portion of these pixels are further masked. In a subsequent stage, all of the pixels with zero values for red, green, and blue as well as those on the edges of the infected cluster were eliminated. The final step in the categorization process is the application of support vector machines (SVM). The experimental findings show that the suggested method is a reliable one for identifying illnesses in plant leaves. The effectiveness of the created algorithm allows it to accurately identify and categorize the diseases under investigation with a precision of 83% to 94%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Songjie Huang, Yunlong Wei, Longji Zhang, Yingjian Yu, Weijie Shi
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323016 (2024) https://doi.org/10.1117/12.3035620
Duck egg shell crack detection by tapping eggs is an important part in the egg processing system. Duck egg shell crack detection has low detection accuracy and slow detection speed. This paper proposed a device for collecting duck egg's acoustic signals and an improved LightGBM classification model. The proposed method consisted of collecting the acoustic signals of duck egg from tapping duck eggs and optimizing the parameters of the LightGBM classification model with the IGWO algorithm. The duck egg detection accuracy of IGWO-LightGBM could reach 96.64% on the collected duck egg crack dataset, which was 15.02% more accurate than the traditional support vector machine. The method achieved significant improvements in both detection accuracy and speed, which was suitable for industrial assembly lines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323017 (2024) https://doi.org/10.1117/12.3035610
The object detection algorithm named as ASF-YOLOv8 is proposed in this paper ,which is based on improved YOLOv8. Firstly, on the infrastructure of YOLOv8, the Attentional Scale Sequence Fusion-ASF is added to integrate feature maps at different scales and capture the image features of different scales, which can extract richer and more accurate feature information. Secondly, the channel and attention mechanism (CPAM) are added to improve the detection capability of targets of different scales. Finally, the loss function is improved by introducing Inner-IoU, which can further improve the detection accuracy of the algorithm by calculating the loss through auxiliary frame. The experimental results show that the detection accuracy mAP50 is improved by 1.5% on VisDrone dataset, so the proposed algorithm has more sufficient detection accuracy in complex traffic environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323018 (2024) https://doi.org/10.1117/12.3035619
The work pressure on offshore platforms in the western oil fields of the South China Sea is high and in urgent need of humanistic care. Aiming at problems such as the difficulty in extracting micro-features by ordinary convolution operations in facial expression recognition and how to effectively fuse multi-scale features, the YOLOv8 algorithm is applied to the facial expression recognition task. YOLOv8 introduces new backbone networks such as C2f module, spatial pyramid pooling structure and multi-scale feature fusion to enhance image feature extraction capabilities, and more effectively guides the model to learn useful features during the training process by optimizing the loss function. In addition, the ability to extract data features is improved by introducing SPD-conv convolution operation. Experimental results on the FER2013 data set show that the improved YOLOv8 algorithm is effective. The recognition accuracy rates of Accuracy_top1 and Accuracy_top5 reached 69.72% and 99.31% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323019 (2024) https://doi.org/10.1117/12.3035746
Due to the limitations of monitoring equipment in actual non-smoking places and the complexity of the background in public places, the pictures taken often have problems such as too small detection target scale, too low resolution, and too many background interference factors. Mainstream detectors are prone to miss detection and false detection in practical applications, resulting in poor detection results. To solve the above problems, this paper proposes an improved SMYOLOv5s based on YOLOv5s. In the feature extraction stage, the convolutional attention module (CBAM) is integrated to enhance the expression of small object features. In the feature fusion stage, Swin Transformer Block module and BiFPN structure are introduced to effectively improve the model missing detection problem. On top of this, an additional smallsize detection head is added to further strengthen the fusion of shallow information of small object and improve the detection accuracy. Experimental results show that the improved algorithm has a detection accuracy of 90.9% for smoking behavior. Compared with the pre-improved algorithm, the mAP increased by 3.8%. The detection effect is significantly improved and has high application value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ying Wang, Ya Fang, Hao Peng, LiJun Song, PengCheng Huang
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301A (2024) https://doi.org/10.1117/12.3036405
Steel is an important metal material, however, various surface defects, such as cracks, bubbles, inclusions, scratches, etc., it is bound to show up in the technical production of steel products. Therefore, the detection of surface defects of steel products is very important. However, defects with blurred edges and divergent shapes such as cracking and oxidation are usually difficult to locate precisely. Therefore, this paper combines the GAM attention mechanism with linear structural unit MV2Block to construct a steel surface defect detection model, aiming to improve the generally low detection accuracy of small defects. Aiming at the problems of fuzzy edges, divergent shapes and tiny defects in traditional defect detection, such as low precision or even undetectable, this paper proposes an intelligent defect detection model, YOLOv5s-MG, to solve the problem that defects are difficult to locate accurately in the detection process. Taking YOLOv5s as the benchmark model, this paper improves the model from three aspects: First, to solve the problem of ignoring global information in the model, a Generalized Additive Models (GAM) global attention mechanism is introduced into the defect detection model. Secondly, in order to solve the problem of nonlinear information loss, the MV2Block module is constructed. Finally, this paper cross-references GAM attention mechanism and MV2Block, and conducts experiments on nue-det dataset. It is found that the YOLOv5s-MG model proposed in this paper has the best effect on the improvement of intelligent inspection defect detection, and mAP has increased by 4.7% compared with the original model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301B (2024) https://doi.org/10.1117/12.3035740
In this paper, the crack recognition function of the bridge surface crack detection system is mainly realized, and the crack identification and positioning model is constructed through the deep learning algorithm to realize the fast and high-precision identification and positioning of cracks. In view of the complex scene of bridge deck detection and the diversity of crack types, an improved recognition model based on YOLO v8 was proposed, which can quickly and accurately frame the crack location and provide convenience for subsequent work. For the lightweight of Bottleneck, the GAM attention mechanism module is introduced to improve the recognition effect. Experiments show that compared with the original YOLO v8 model, the YOLO v8-FG model has been improved in various indicators, including the accuracy increased by 4.9%, the recall rate increased by 4.7%, the mAP increased by 5.5%, the detection frame rate increased by 25.56, and the model size decreased by 19.5MB. best.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301C (2024) https://doi.org/10.1117/12.3036011
During the production of continuous cast steel, under the constraints of various factors such as material qualities and processing techniques, various types of defects may occur within the steel, such as inter-cracks and shrinkage holes, which will affect the quality of the steel. Continuous casting billet defect detection is different from the general target detection task, which involves small targets and targets with very high aspect ratios. To address the target characteristics, this paper uses improvements based on the YOLOv5 algorithm to improve the accuracy of continuous casting billet defect detection. Firstly, the self-constructed dataset is analysed, while a clustering algorithm based on differential evolutionary algorithm is applied to re-cluster the anchor frames of small targets and extreme aspect ratio targets. For small targets, the attention mechanism and BiFPN structure are used to improve feature extraction. Finally, varifocal loss is introduced to mitigate the effect of positive and negative sample ratio imbalance on the training results. The experimental results show that the mAP of the improved algorithm is improved by 5.1% compared to the original YOLOv5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301D (2024) https://doi.org/10.1117/12.3035739
This paper explores the feasibility and challenge of applying FPGA technology, specifically Anlogic Infotech EG4D20 FPGA, in wafer and chip defect detection. A preliminary FPGA-based chip defect observation system is constructed, enabling real-time monitoring of surface features on silicon wafers. This system rapidly displays existing defects on the silicon surface, meets the real-time image acquisition requirements of chip observation systems, and exhibits high reliability. Based on this research direction, a design scheme is proposed for an FPGA chip defect detection system equipped with the Yolov5 algorithm model. This scheme provides a new research approach for chip defect detection technology and highlights the immense potential of FPGA in this field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301E (2024) https://doi.org/10.1117/12.3035822
To ensure the safe and reliable operation of transmission lines, it is particularly important to monitor the status of key equipment such as insulators online and maintain them in a timely manner. As a key component of line insulation, the quality of pin insulators may lead to serious accidents. This article proposes an algorithm based on bipolar detection to address the problem of identifying ceramic pillar type porcelain insulators, composite insulators, and ceramic pin type porcelain insulators in existing algorithms. The algorithm first uses an improved YOLOv8 object detection network to detect the insulators mentioned above, and the backbone network introduces efficient modules such as Focus, CBM, CSP, etc.; The detection head adopts a PANet structure to fuse multi-level features. Then, the presence of the top area of the ceramic pin type porcelain insulator in the area of interest is judged. Only when both are detected simultaneously, can it be considered a true pin type insulator. On a self-built dataset of 110000 images, the improved algorithm improved the accuracy of pin insulator recognition from 88.9% to 96.3% and the recall rate from 90.1% to 97.2% compared to the original YOLOv8. The dual level identification strategy can effectively eliminate interference from other types of insulators and significantly reduce the false detection rate. The algorithm runs at a speed of 126fps, which can meet real-time detection requirements. In addition, the paper also introduces the details of algorithm training, model deployment, and Docker containerization deployment. Experimental results have shown that this method can accurately and efficiently detect line pin insulators and has high practical value for application scenarios such as drone inspections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301F (2024) https://doi.org/10.1117/12.3035697
The lane detection algorithm based on deep learning classifies according to different lane representation forms. Among them, the lane detection algorithm based on row classification uses the prior characteristics of lane distribution to realize fast and accurate lane detection. With the development of technology, the recall rate and detection ability of row classification lane detection algorithm in complex urban scenes are slightly insufficient. This paper reconstructs the lane line detection method based on row classification, and detects the exact position of lane lines by calculating the positive sample grid of lane lines, so as to replace the expectation-based lane line location calculation method; Secondly, combining the prior characteristics of lane lines, the column guidance matrix is designed to solve the problem of insufficient constraints caused by the sparse distribution of lane lines between different anchors at the edge of the image. At the same time, the global information of the guidance matrix is used to improve the lane detection ability of the model in the complex urban environment. In this paper, CULane and TuSimple benchmark data sets were used for verification, and the F1 reached 77.9% and 97.2% respectively, which realized the fast and accurate detection of lane lines in complex environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301G (2024) https://doi.org/10.1117/12.3035866
The use of drones for insulator defect inspection of transmission lines has become the mainstream in the industry. In response to the problems of low detection speed, insufficient detection accuracy, high network complexity, and difficulty in deploying to mobile devices such as drones for insulator defects in power lines, propose a lightweight improved power transmission lines insulator defects algorithm of YOLOv8. Firstly, replace the backbone network of YOLOv8 with a lightweight MobileNetv3 network to reduce the number of parameters. Efficient Multi-Scale Attention (EMA) is used in MobileNetv3 to more accurately locate and identify objects. Secondly, Ghost Shuffle convolution (GSConv) is introduced to redesign the feature fusion network to ensure detection accuracy while reducing computing consumption. Finally, using the MPDIoU loss function to improve training convergence speed and make prediction box results more accurate. Experimental results can prove that the lightweight improved model achieves an accuracy of 97. 1% and a recall rate of 97. 3%, with a 70% reduction in the number of model parameters. It is more suitable for deployment on drone platforms and meets the requirements real-time and accuracy requirements of insulator defects detection on the edge side of transmission lines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301H (2024) https://doi.org/10.1117/12.3035584
In the PCB defect detection, the defect targets to be detected are small. In order to avoid a large number of target detector parameters and low efficiency of detection, a lightweight defect detection model for PCB based on improved YOLOv7- tiny is proposed. First, due to the complex fragmentation operation and insufficient across spatial scales down-sampling operation of the backbone network ELAN, we propose a fast backbone network constructed by a cross-scale max pooling layer with zero parameter and a partial channel re-parameterized module, utilize the cross-scale interaction, multi-branch training of the backbone network to improve feature extraction capability and achieve simplify reasoning. At the same time, to address the complex connectivity of the PANet and the deep network is difficult to capture the feature information of small targets, a single-scale prediction head method based on the global and local hybrid feature condensing module is proposed, which models the multi-scale information of the backbone network globally and locally, and shares the information interactively, to better capture the global context correlations and dependencies of images, and enhance the location information of small targets , as well as to further reduce the computational costs and the complexity of the model. Experiments were carried out on the DeepPCB dataset. And the results show that the number of parameters and flops of the improved model are reduced to 29% and 51% compared to the original model, the inference speed is increased by 62%, which is better than other mainstream lightweight target detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301I (2024) https://doi.org/10.1117/12.3035575
Aiming at the problems of low recall and poor classification due to the complexity of chest X-ray lesion sites, this paper proposes a chest X-ray disease detection and classification method based on the chest dataset NIH Chest X-rays. The main work is: 1) Aiming at the problems such as low recall of the base model, the generative network of GANomaly is replaced by multi-scale coding and decoding U-Net, and at the same time, the memory-memory enhancement module is added between the coding and decoding networks; 2) Designing cascaded HRNet classification network under the premise of achieving high recall, and feature fusion is performed through global feature extraction and local feature extraction. Compared with the mainstream classification methods DenseNet, ResNet, and VGG, the cascaded HRNet method improves the average AUC by 7.9% and achieves the highest scores in most of the diseases; 3) Demonstrate the validity of the designed model through joint GANomaly-HRNet experiments on the dataset NIH Chest X-rays and the validation set of real hospital samples and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xiuyou Wang, Guangren Huang, Huaming Liu, Haoyu Sun
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301J (2024) https://doi.org/10.1117/12.3035634
Printed circuit board defects will directly affect the performance of electronic equipment. The industry has been troubled by how to effectively detect the defects of printed circuit boards. To solve this problem, an improved algorithm based on DiffusionDet is proposed to detect PCB defects directly without relying on the query of experience objects or learnable objects. The specific processing process is as follows: firstly, Object Detection Convolution (ODConv) is used to replace part of the static convolution kernel of the feature extraction network to broaden the dimension of convolutional operation learning, so as to obtain feature maps that are more conducive to defect detection. Secondly, the GIoU loss function is used to replace the original IoU loss function to improve the model's ability to optimize the position of preselected frames. The experimental results show that the improved DiffusionDet average accuracy (mAP) is increased from 98.12% to 98.83%, showing better performance in the detection of printed circuit plate defects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301K (2024) https://doi.org/10.1117/12.3035749
It is a challenging task to be able to quickly detect tiny targets on sea surface under a complex marine environment with strong clutter. Because the difference in statistical properties of clutter varies greatly, traditional target detection methods suffer from feature extraction difficulties. In this paper, a convolutional attention-based method for detecting small targets on the sea surface is proposed. Radar echo datasets are first preprocessed to obtain time-frequency distribution spectrogram information, which is transformed into feature images for the input of network. Then a multi-scale convolutional attention fusion (MCAF) module is designed to enhance feature representation while reducing the interference of clutter in a lightweight manner without bells and whistles. Our method results on the IPIX dataset show that detection accuracy can achieve 88.9% under lower signal-to-clutter ratio (SCR) sea states when the false alarm rate is 10-3. Compared with other detection models, our proposed network model has stronger detection performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301L (2024) https://doi.org/10.1117/12.3035587
Wind power has developed rapidly in recent years, and regular inspection of wind turbines can guarantee their normal operation. The use of inspection robots can improve inspection efficiency and reduce risks. The traditional way mainly relies on the manual operation of the UAV to search the wind turbines, which still has the problem of low efficiency. For the object search problem in aerial images, this paper proposes a wind turbine object detection algorithm based on the Inception Network, which uses artificial intelligence to extract wind turbine objects in the images. For the problem of excessive parameters in the baseline algorithm, this paper replaces the original backbone network to reduce the number of parameters. For the problem of overfitting, the paper adopts the strategy of Batch Normalization to improve the generalization ability. After the test using the NAIP dataset, the algorithm achieves the accuracy of 93.10% on the wind turbine object detection task. The results show that the object detection algorithm based on Inception Network can achieve better results and has certain robustness and generalization ability for objects in different environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301M (2024) https://doi.org/10.1117/12.3035499
The road subgrade filled with compacted collapsible loess is prone to becoming loose and porous, leading to uneven deformation and pavement distress. In order to address this issue, this study utilized High Pressure Jet Grouting Pile (HPP) and Dry Mix Gravel Pile (DMP) to reinforce the loess subgrade. Nondestructive detection methods, such as elevation observation, Raleigh-surface-wave testing, and ground penetrating radar (GPR) were conducted on an on-site embankment to evaluate the accumulated deformation rate (ra) and compactness (cd) of the treated subgrade. The results demonstrate that HPP and DMP effectively reduce collapsible deformation and enhance the strength of the loess subgrade. Specifically, the observation results reveal that the ra decreased to less than 5 mm per month, compared to the 20 mm per month before treatment. Moreover, the velocity of the treated embankment significantly increased, indicating that the HPP and DMP piles improved the density of the subgrade, leading to enhanced compactness (cd). It was observed that the subgrade deformation no longer accumulated rapidly when ra was below 5 mm per month or when the Rayleigh-surface-wave velocity was less than 150 m/s. Additionally, the application of on-site elevation observation and Raleigh-surface-wave testing proved to be quick and reliable in evaluating the deformation and compactness of the loess subgrade.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301N (2024) https://doi.org/10.1117/12.3036496
In order to explore the extraction method of late rice planting areas in the Dongting Lake Plain represented by Junshan District, multispectral satellites and synthetic aperture radar satellites were combined to establish continuous remote sensing phenology curves for rice. By establishing multi-dimensional spatial features, based on an unsupervised classification spatial clustering algorithm, and using 2 years of continuous multispectral and SAR images, a complete continuous remote sensing phenology curve of rice was constructed by coupling, and accurate rice field plot information was screened. Therefore, the late rice phenological characteristic plot based on satellite images enables high-precision extraction of complex mountainous rice planting areas. The result shows, the method solved the adverse effects on rice remote sensing classification caused by problems such as broken underlying surface, scattered farmland distribution, and fewer available images in the study area. The overall classification accuracy is 86.49%, which is better than traditional supervised classification methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301O (2024) https://doi.org/10.1117/12.3036541
Smart contracts are automated agreements encoded in the form of code, enabling functionalities such as automated asset transfers and digital asset issuance. These contracts eliminate the need for third-party trust verification, thereby reducing transactional intermediates and costs. Due to their irrevocability and non-breachability, smart contracts execute automatically once preset conditions are met, without the possibility of revocation or modification. However, these characteristics can be exploited by malicious individuals seeking illegitimate gains through vulnerabilities in the contracts. Existing smart contract vulnerability detection tools primarily rely on expert-defined rules, leading to a high false alarm rate and a tedious detection process. Nevertheless, smart contract owners often withhold open-sourcing their contracts due to privacy concerns. This reluctance results in issues like insufficient data and incomplete training samples for deep learning models during the training phase. Consequently, the trained models suffer from limited detection scope and inadequate precision. We compare our approach with existing deep learning methods for detecting vulnerabilities in smart contracts: BiLSTM+Attention. To address these challenges, this paper introduces a federated learning-based approach for detecting vulnerabilities in smart contracts. This method leverages Federated Learning and the BERT model for smart contract vulnerability detection. Significantly, enterprises involved in this approach are not required to provide source code; instead, they participate in federated learning by contributing gradient data. The BERT model is then trained on this gradient dataset. Experimental results demonstrate that the proposed method achieves a remarkable 91.36% accuracy in vulnerability detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301P (2024) https://doi.org/10.1117/12.3036574
Social media, while flourishing, has also become a vital tool exploited by various terrorist forces. To analyze and study information related to terrorism on social networks, we propose a model framework that combines complex networks with natural language processing. This framework aims to explore influential key users, community structures, and communication topics within terrorism-related communities. Using a dataset of ISIS jihadist-related tweets from Twitter between 2014 and 2016, we construct an extremist network and employ centrality measures in the framework to identify important influential members. Additionally, the framework employs the ASOCCA community detection algorithm to reveal community structures and utilizes NMF topic modeling to extract topics from tweets and determine the focus of each important member. This approach enhances the capability for monitoring, identifying, disrupting, and tracking online terrorist activities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301Q (2024) https://doi.org/10.1117/12.3036550
In recent years, as time series forecasting has been increasingly integrated into daily operations, scholarly attention has pivoted towards its security and fidelity. Traditional deep learning models for time series prediction often exhibit a lack of adversarial robustness, rendering them vulnerable to strategically crafted perturbations that may skew predictive outcomes significantly. Presently, adversarial interventions in time series data predominantly leverage global perturbations; however, such manipulations are highly conspicuous and thus easily detectable given the intrinsic sensitivity of time series data to disturbances. In response, we introduce a novel gradient-free, sparse black-box attack methodology, denoted as kMGAT, which harnesses mutual information and genetic algorithms to adeptly balance the stealth and efficacy of adversarial sample generation. This approach is capable of orchestrating potent attacks without materially distorting the original data corpus. Rigorous comparative evaluations across diverse deep learning architectures have demonstrated that kMGAT can fabricate adversarial samples that are both challenging to detect and capable of significantly altering the predictive patterns of the models. This methodology has yielded commendable results in several demanding applications, including forecasting the occupancy rates of hospital parking, electrical consumption metrics, and subway traffic volumes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jiaqi Geng, Rujia Qiu, Long Zhao, Jianlin Li, Hao Zheng, Teng Tian, Dongbo Song, Lu Xing
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301R (2024) https://doi.org/10.1117/12.3036533
The early detection of cable fires can accurately predict the characteristics of the fire, targeted fire prevention and firefighting measures to reduce the probability of fire. Therefore, this paper discusses the early detection technology of PVC cable fire based on gas sensors, and analyzes the performance of quantum dot gas sensors through simulation experiments. The results show that the response range is 1-1900 ppm, the working temperature range is 0~120°C, the resolution of hydrogen is 1. 37×10-4, the resistance error is less than 4%, which can meet the research and development needs of semiconductor quantum dot gas-sensitive materials and sensors. Through the above-mentioned studies, so as to provide certain reference for the development of early detection technology of PVC cable fire based on gas sensors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301S (2024) https://doi.org/10.1117/12.3036321
In response to the challenges of difficult feature extraction and insufficient accuracy in huge data analysis and prediction of sewage indicators, this study establishes an improved LSTM model for predicting the PH value of sewage indicators. Initially, the BiLSTM bidirectional long short-term memory network is employed, followed by parameter optimization using the arithmetic optimization algorithm t-CAOA improved with adaptive t-distribution and dynamic boundary strategy, and the incorporation of a single-head self-attention mechanism. The experiments demonstrate that the improved model established in this study outperforms traditional water quality prediction models in terms of MSE, RMSE, MAE, MAPE, and R2, effectively enhancing the accuracy of water quality prediction and possessing certain application value. Finally, the robustness and generalization ability of the model was verified after testing in a variety of different environments in the watershed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wenhan Zhong, Ligang Luo, Xixi Chen, Ronghui Yuan, Lin Song, Huihui He
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301T (2024) https://doi.org/10.1117/12.3036451
This study aims to evaluate the effectiveness of attenuated total reflection infrared (ATR-IR) measurements for fiber content of cotton-bicomponent materials in both direct and reagent-free modes. The characteristic peak areas of cotton, polyester, polyacrylonitrile, chinlon, and spandex are represented by A1060, A730, A2250, A1720, and A890, respectively. At middle-infrared wavenumbers, linear relationships between the fiber content and peak area eigenvalues was observed, confirming the feasibility of quantitative correction models for content testing. Comparison between the fiber content measurements of actual samples using the ATR-IR and chemical dissolution methods revealed a difference of ≤ 7.7%, with spandex/cotton exhibiting a higher discrepancy than other bi-component fibers. The optimized IR technique is expected to offer better universality in modern textile testing due to its nondestructive nature and time-saving benefits.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301U (2024) https://doi.org/10.1117/12.3036480
In the realms of flight safety, agricultural monitoring and medical diagnostics, the detection of foreign objects within images processing technology is of importance. While recent deep learning-based detection methods have enhanced overall performance compared to traditional image techniques, they often encounter challenges in identifying small or novel foreign objects due to data biases. To overcome this limitation, a new approach termed foreign object detection based on contrastive analysis of key regions (CR-FOD) has been introduced. CR-FOD is bifurcated into two phases: Foreign Object Promoter (FOP) and Contrastive Region Object Detection (CROD). During FOP, potential foreign object locations are marked through key point comparison. In the CROD phase, Objects are extracted from the inference image and the corresponding background image based on key-point prompt information by FOP. Finally, the similarity of the objects is compared by CROD to determine the presence of a foreign object at the location. In addition, the efficacy of CR-FOD is confirmed through experimental testing, demonstrating its superior performance in detecting a wide array of foreign objects, particularly those that are minute or previously not encountered during the training datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301V (2024) https://doi.org/10.1117/12.3036540
This paper addresses the issue of segmentation errors in video extensometer caused by deformation, dispersion, and cracks in markers during the stretching process. A method combining frame -to-frame matching with deep learning is proposed to address the high robustness segmentation problem of black and white markers on materials undergoing large strains. The selection of template position during image matching and the update of the template throughout the stretching process are discussed. Experiments and analysis were conducted on various types of rubber and plastic specimens that exhibit significant strain and irregular deformation. The results demonstrate that this method can be applied to line marker-type video extensometers, enhancing the overall robustness of the measurement algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301W (2024) https://doi.org/10.1117/12.3036528
The critical node detection (CND) is after removing these critical nodes of a given number, there is the least network connectivity on a predefined measure for the residual network. The CND arouses great attention for its extensive practical applications, however, the CND is still computationally challenging until now. In this paper, we present an improved moth-flame optimization for CND, in which opposition-based learning and fast population evolution strategy are implemented to generate solutions with diversity and accelerate the convergence, and hybrid Gaussian evolution method helps the proposed algorithm to get rid of local optima traps and enhance its exploration ability. In our experiments, the performance of proposed algorithm is evaluated on the synthetic networks and real-world networks, and we also make comparisons with other algorithms. The experimental results illustrate that the proposed algorithm outperforms other current algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301X (2024) https://doi.org/10.1117/12.3036422
Surface defects in aluminum profiles have a significant impact on product performance, safety, and reliability. Traditional physical-based inspection methods suffer from high costs, inefficiency, and lack of visualization of the inspection process. Machine learning-based inspection methods, which rely on artificially designed features, face limitations in detection versatility and vulnerability to interference from the external environment. To address these challenges, this paper proposes an improved surface defect detection method based on YOLOv5. To enhance the model's ability to detect targets of various sizes, we incorporated the SKA module into the network's backbone, which enables the model to dynamically determine the size of the receiving field based on the input data. And we also add up-sampling and tensor stitching operations to the neck network to expand the amount of data in the feature dimension. A direct path was established between the backbone network and the PANet architecture to enhance the feature fusion capability of the model. In addition, we also introduce SAC in the vicinity of the detection head to further improve the detection performance of the model. The experimental results demonstrate that our proposed model can effectively identify various types of surface defects on aluminum profiles. Compared to the original model, the recall rate is increased by 2.0%, the mAP@0.5 is increased by 2.2% and the inference speed can reach 58.7 FPS.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301Y (2024) https://doi.org/10.1117/12.3036283
Face detection is the most basic processing in the face recognition process. If the detection method is not appropriate, it can cause the recognition process to collapse. Face detection is very important in face recognition. The facial pattern is very complex and susceptible to external interference. Common facial detection algorithms generally have weaknesses such as high computational complexity, slow speed, and high false alarm rates. In this paper, a skin color model detection method based on the YCbCr color space is used for facial detection, which solves the problem of high computational complexity. Morphological theory is also used for subsequent image processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Monitoring System and Equipment Simulation
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132301Z (2024) https://doi.org/10.1117/12.3035456
This paper proposes a Predictive and Priority-based Dynamic Resource Allocation strategy (PPRA) designed to improve resource management in cloud computing platforms. By analyzing forecasted loads and the priorities of different tasks, PPRA dynamically adjusts resource allocations to optimize the overall performance of the system. Experiments show that compared to traditional random allocation and static threshold strategies, PPRA excels in resource utilization, Service Level Agreement (SLA) compliance rate, average response time, and system throughput. These results verify the potential and advantages of the PPRA strategy in applications with rapidly changing cloud computing demands.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323020 (2024) https://doi.org/10.1117/12.3035833
This study addresses the challenge of automating the automotive instrument testing process, traditionally reliant on manual inspection and subjective evaluation. We propose an algorithm based on template matching, enhanced with feature-point-based image matching techniques, to improve the accuracy and reliability of automated testing for automotive instruments. By integrating classical template matching with advanced feature detection algorithms, our approach aims to overcome the limitations of traditional methods, especially in handling variations in lighting, orientation, and scale. This paper presents a comprehensive analysis of the algorithm's performance, demonstrating its potential in automating and refining the instrument testing process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323021 (2024) https://doi.org/10.1117/12.3035670
Some effective word embeddings have been proposed by researchers and acted in NER (Named Entity Recognition) tasks, but these embeddings often suffer from under-utilization, resulting in poorly characterized word vectors encoded. Driven by the lack of entity information, I perform lexical enhancement techniques at the input layer of the NER model for Chinese. Roughly speaking, we introduce an attention mechanism to the Softlexicon approach, which makes the embedded entity information more complete. We conducted experiments. On all four Chinese NER datasets, the performance of the improved Softlexicon is improved compared to the original one. In addition to using different datasets, we also embedded the improved Softlexicon in combination with different pre-trained models, and the attention mechanism still led to good performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323022 (2024) https://doi.org/10.1117/12.3035675
The conventional quadruped robot cable tunnel inspection task autonomous navigation recognition mode is preset as vertex processing. The complexity of navigation is high, which leads to the prolongation of the average time of autonomous navigation. Therefore, the design and verification analysis of the quadruped robot cable tunnel inspection task autonomous navigation method based on improved convolutional neural network are proposed. According to the current autonomous navigation mode, first deploy autonomous patrol navigation nodes to collect patrol navigation data, adopt the circulation recognition mode, simplify the navigation complexity, and add automatic circulation recognition program. On this basis, an improved convolutional neural network quadruped robot autonomous navigation model for cable tunnel inspection task is constructed, and the displacement correction adjustment method is used to achieve autonomous navigation processing. The test results show that for the six selected cable tunnel patrol navigation test sections, the average time of autonomous navigation finally obtained at the three selected navigation points is controlled below 0.25s, which indicates that the autonomous navigation method of the quadruped robot cable tunnel patrol task designed by combining the improved convolutional neural network is more flexible, specific and targeted, In the patrol navigation project, the response ability is faster and has practical application value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhenxue Bian, Yongkang Deng, Bingshuo Lu, Chang Liu, Chuanhao Hu
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323023 (2024) https://doi.org/10.1117/12.3035455
Aiming at the problems such as few probes, limited detection area and lack of nuclear radiation detection module in existing aquaculture monitoring devices, a new type of aquaculture in Situ monitoring device is designed. Based on the innovation of six-probe retractable mechanical structure, the cerium bromide scintillator detector is added compared with the traditional device, and the aquaculture monitoring sensor with six parameters is integrated to realize the three-dimensional monitoring of multi-parameter water quality information in the aquaculture area, ensuring the safety and reliability of the water quality in the aquaculture area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323024 (2024) https://doi.org/10.1117/12.3035439
Semi-global stereo matching (SGM) algorithm is widely adopted in stereo matching due to its optimal trade-off between accuracy and efficiency. However, SGM exhibits limitations in accurately matching weak texture regions and entails high computational complexity. The present paper proposes a novel enhanced SGM algorithm by integrating the CT cost and BT cost, aiming to address this issue. Initially, the anti-interference capability of Census transform is improved by setting a standard deviation threshold. The fused weights of the Census cost and window-based BT cost compensate for insufficient image information. Subsequently, an 8-channel dynamic programming algorithm is utilized to aggregate costs followed by a winner-take-all approach to compute disparity values. Furthermore, a weighted least squares filter optimizes the disparity map. Finally, the proposed algorithm's anti-occlusion performance and matching accuracy are evaluated using the Middlebury dataset. Experimental results demonstrate that our proposed cost calculation method outperforms CT cost and BT cost in terms of anti-interference performance significantly when compared with both SGM algorithm and AD-Census algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323025 (2024) https://doi.org/10.1117/12.3035445
This paper introduces a cutting-edge solution for enhancing the supervision of power grid operations, integrating remote sensing, AI, and advanced algorithms to ensure grid safety and efficiency. Leveraging deep learning, the system accurately interprets remote sensing images to detect unauthorized activities and assess safety compliance with over 95% accuracy. A comprehensive risk assessment model, considering environmental, load, and weather factors, enables precise risk level determination and early warnings. Additionally, an innovative decision optimization engine, utilizing evolutionary algorithms and reinforcement learning, facilitates the balanced optimization of operational costs and efficiency within risk boundaries. Systematic evaluation demonstrates the solution's capability to significantly boost operational control efficiency and reduce risks, highlighting its promising application potential in power grid management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Junchen Zou, Xiaoquan Sun, Yanzhao Wang, Xinyuan Cui, Yongqiang Chen, Ling Yang, Zhifeng He, Dan Yuan, Yuan Mu
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323026 (2024) https://doi.org/10.1117/12.3035487
Odor pollution has become a global environmental issue of increasing concern in recent years, and the odor gas monitor is widely used to measure the odor concentration, however, whether its value is still accurate and reliable after long-term use is unknown. In this work, the electrochemical odor monitor was chose and its performance and uncertainty were estimated. The result showed that the maximum absolute value of indication error of 8 standard gases was 9.17%, and other performance indicators such as zero shift, scale shift, response time and repeatability of the odor monitor were not more than 1.30%, 1.25%, 91s and 0.70% respectively. Moreover, when the concentrations of CH4S was 16.00 μmol/mol, the uncertainty was 0.34 μmol/mol. In a summary, the performance indicators and uncertainty of the monitor were excellent, and it need to keep the good status by means of calibrating periodically, and that could play an important role in preventing odor pollution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323027 (2024) https://doi.org/10.1117/12.3035458
In today's society, target detection technology, as a basic technology, has been widely used in many fields such as municipal, public security, medical, transportation, exploration and so on. However, there are many problems in the embedded target detection system currently on the market, such as cumbersome model training and retraining process, simple human-machine interface and difficult operation. In view of the above problems, this paper proposes an autonomous target detection system based on embedded platform and cloud computing. It mainly uses Raspberry Pi and USB camera to collect image data, and then transmits it to the cloud server through the network. It completes data annotation, model training and autonomous iterative training of the model in the cloud on the Web management platform. Through the cooperation of software and hardware, the target detection results can be viewed on the platform in real time, which realizes the visualization, automation and simplification of the target detection process are realized, and the operation difficulty of the system is reduced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323028 (2024) https://doi.org/10.1117/12.3035607
The introduction of pioneering technologies such as programmable gradient information and Generalized Efficient Layer Aggregation Network in YOLOv9 has significantly improved its efficiency, accuracy, and adaptability compared to YOLOv8. The proposal of SPPELAN has notably enhanced the network's feature extraction capability. However, SPPELAN's use of multi-level maximum pooling layers may lead to the loss of some detail information, especially with larger pooling kernel sizes, potentially ignoring smaller targets or details. To address this, we propose the Spatial Multi- Fusion Net, which involves segmenting channels and then blending image features extracted from different channels using maximum pooling blocks with different kernel sizes and convolutional blocks of varying depths. This allows the model to capture features at different abstraction levels, thereby achieving the goal of collecting features of objects of different sizes. Integrating the Spatial Multi-Fusion Net into YOLOv9 further improves its performance on the COCO dataset's object detection task, with all metrics showing enhancement, despite adding fewer parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Songyang Wu, Zhiqi Li, Wen Wang, Lixiang Shen, Ning Li
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323029 (2024) https://doi.org/10.1117/12.3035867
Mobile monitoring methods are easily affected by changes in the coverage cycle of automated monitoring, resulting in poor security monitoring effectiveness. Therefore, it is necessary to design a new security monitoring method based on edge computing for mobile monitoring data to flow across domains., Generate a security monitoring process for cross domain circulation of mobile monitoring data, and use edge computing to build a cross domain circulation security monitoring model for mobile monitoring data, so as to achieve cross domain circulation security monitoring of mobile monitoring data. The experimental results show that the designed mobile monitoring data cross domain circulation security monitoring method has good monitoring effect, reliability, and certain application value, making a certain contribution to reducing the cross domain circulation risk of mobile monitoring data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302A (2024) https://doi.org/10.1117/12.3035438
This study aims to enhance the prediction of power transformer oil temperature time series by proposing a model that incorporates CNN, BiLSTM, and attention network structures. To address the limitation of unidirectional information transfer in the LSTM model, we introduce CNN for feature extraction, use the attention mechanism to enhance the learning effect, and employ the TTAO optimization algorithm to optimize the model parameters. The experimental results indicate that the TTAO-CNN-BiLSTM-Attention model achieves optimal performance on the MAE, R-squared, and Time metrics, slightly lagging behind the BiLSTM-Attention model on MSE and RMSE. However, overall, the established TTAO-CNN-BiLSTM-Attention model demonstrates good effectiveness. Compared to the LSTM model, there are reductions of 34.5%, 54.6%, 32.6%, and 12.1% in MAE, MSE, RMSE, and Time respectively, with a 14.3% increase in R-squared. Additionally, the R-square value is higher than that of all other models, suggesting a wide range of potential applications and practical value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302B (2024) https://doi.org/10.1117/12.3035574
License plate detection plays a relatively important role in highway traffic safety management and community parking lot management. This paper studies an algorithm for license plate detection. Firstly, the license plate image is preprocessed using grayscale binarization and edge detection. Then, the image is corrected through mathematical morphology and Radon transform. Finally, the purpose of license plate detection is achieved by segmenting characters and template matching. Experimental results show that the detection accuracy of the algorithm for square license plate images and tilted license plate images is 97.5% and 87.5%, respectively, with an average detection rate of 91.17%. The conclusion indicates that the algorithm in this paper can provide an effective tool for license plate detection systems in various fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302C (2024) https://doi.org/10.1117/12.3035582
The accuracy of image edge detection determines the measurement accuracy of chip positioning. This article uses subpixel edge detection technology to solve the problem of detecting and extracting chip positioning marks in the automatic alignment system of SMT machines. The spatial moment sub-pixel edge detection algorithm was derived and applied to the edge extraction and analysis of chip positioning markers in SMT machines. The experiment shows that the spatial moment method has good accuracy and robustness, with relatively low computational complexity and insensitivity to noise, which helps to improve positioning accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302D (2024) https://doi.org/10.1117/12.3035699
In recent years, with the rapid development of drone photography technology, drones have emerged as a new method for photography and observation. We have applied object detection technology to drone photography and proposed a YOLO-style 2D object detection network called YOLOvis based on the YOLOv8 architecture. We introduced the PatchFPN structure to improve the network's robustness in detecting small objects and proposed the TranConcat structure to scale features for enhanced usability. YOLOvis utilizes the challenging VisDrone data set as the model's task data set. With a slight increase in computational cost, the model achieves an accuracy of mAP50 33.7%, which is a 3.7% improvement compared to YOLOv8 of the same size and a 17.0% improvement compared to YOLOv5 of the same size. Additionally, to better tailor the model to specific needs, we have proposed YOLOvis models of different sizes. The code and pretrained models are in https://github.com/BarryGUN/bgyolo_v8_application.git.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302E (2024) https://doi.org/10.1117/12.3035679
Reinforced concrete (RC) is favored by the construction industry because of its low cost and good stability, however, the internal reinforcement may be damaged in advance by some special conditions, and it is of great importance to use pulsed eddy current (PEC) non-destructive testing methods to carry out periodic testing of RC. In this paper, a voltage source is used as the excitation power source, the multilayer structure model is divided into several regions, the magnetic vector potential expression of each region is solved by the Green function, and the time-domain analytical solution of the coil current is obtained, which based on the decoupling of the frequency expression of the magnetic vector potential. The pulsed voltage excitation signal is passed through the Fourier series to obtain the analytical solution of the transient voltage generated in the axial pick-up coil. The simulation of Comsol model is constructed to compare the simulation data with the derived time-domain analytical solution, and the results show that the analytical solution of the shaft detection coil constructed in the article is highly accurate, which lays a theoretical foundation for the design of the detection of RC method and the sensor of PEC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302F (2024) https://doi.org/10.1117/12.3035668
At present, the operation safety risk is huge due to manual operation of hook removal from open wagons . Each type of wagon hook part and mechanical structure has certain differences attributable to the high number of open car models. And Coal unloading operation time needs to be set according to the railroad scheduling time, and hook removal action is performed outdoors. So realization of automatic hook removal faces the difficulty of accurately identifying and locating the hook removal object under complex lighting conditions. In this paper, according to the visual image feature of large Gaussian statistical difference between the hook part image and the background pixel of railroad track, UWB wireless positioning module is used to follow the traction position of tractor, laser probe recognizes the position of the edge of wagon, and visual camera obtains the image of the object to be picked, and then obtains the contour of the object to be picked through adaptive Gaussian background image removal algorithm, and then further obtains the location information and type information of the object to be picked.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302G (2024) https://doi.org/10.1117/12.3035618
Recently, the Transformer based on grid features has achieved great success in image captioning, but it still has some problems: the flattening operation of Transformer will destroy the positional information among visual objects, and only the output of the last layer encoder is sent to the decoder, which will lose low-level semantic information. To solve the above problems, we first introduce Distance-aware self-attention (DA), which considers the original geometric distance between visual objects in a two-dimensional image during the self-attention modeling process, and integrates distance information into attention calculation through a mapping function, better capturing the relational information among visual objects. Second, we propose the Multilayer Aggregation (MA) module, which aggregates the output of the encoder and establishes a weighted residual connection as the final output, sent to the decoder separately. It aggregates information from different encoder layers to achieve cross-layer semantic complementarity, features with rich semantics can be explored simultaneously from both low-level and high-level coding layers. To verify the validity of our proposed two designs, we applied them to a standard Transformer and conducted extensive experiments on MS-COCO, a benchmark dataset for image captioning. The experimental results demonstrate the effectiveness of our proposed Distance-aware Multilayer Aggregation Transformer (DMAT) model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302H (2024) https://doi.org/10.1117/12.3036209
In order to create a kind of wide-ranged nano-grade accurate displacement measurement with low manufacturing costs, strong anti-interference capabilities and stable performance, the nanometer time-grating sensor based on AC electric field coupling has been researched. According to the working principle of the sensor, the effects of moving ruler and fixed ruler in the condition of various geometric position of nanometer on the accuracy of grating sensor has been researched. Firstly, through theoretical model analysis of the influence of geometric location to the sensor precision. Then, utilizing the air bearing linear guide of ultra-precision and high-precision six-dimension experiment mechanism to build an experiment platform and nano-grade sensors for precise quantitative experiments. The sensor geometry location parameter and error are obtained by using the theoretical model analysis combined with experiment. The experiments show that the nano-grade sensor range can measure 200mm,the accuracy can reach ±400 nm in geometric location deviation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302I (2024) https://doi.org/10.1117/12.3035528
This manuscript tackles the challenge posed by gaze tracking configurations utilizing a single camera and a single light source, which rely on two-dimensional mappings and fail to accommodate natural movements of the user's head. In our method, we initially identify the face within the image to isolate the area surrounding the eye. Following this, the Hough transform is applied to ascertain the locations of both the pupil's center and the reflective glint. Subsequently, a mapping equation is utilized to define the correlation between the direction of the gaze and the screen's coordinates. To correct inaccuracies in gaze prediction stemming from movements of the head, we employ a series of linear formulas. These incorporate coefficients that are adjusted according to the shifts in the user's head position. Gaze monitoring is then realized through the amalgamation of data on the direction of the gaze and the movements of the head. The testing of our technique demonstrates it attains an average precision of 0.88 degrees along the x-axis and 1.08 degrees along the y-axis without the user moving their head. With head movements, the precision averages at 1.40 degrees on the x-axis and 1.59 degrees on the y-axis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhiying Chen, Quanrui Chen, Qiufu Wang, Lin Chen, Xiaoliang Sun
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302J (2024) https://doi.org/10.1117/12.3035605
Real-time and accurate relative pose measurement is one of the key technologies for autonomous interaction of dual-motion platforms. The monocular pose measurement method has drawn wide attention from researchers because of its advantages of simple configuration, low cost, etc. In this paper, an Infrared lights-based monocular pose measurement for autonomous interaction of dual-motion is proposed. First, a specific number of infrared LEDs are deployed on the target platform. The targets are detected via Pixels-priori adaptive threshold segmentation on the infrared image, then sub-pixel coordinates are acquired through a moment-based method. Second, a Random Perspective-3-Points (P3P) method is exploited to identify the 2D-3D correspondences. Finally, on the assumption of constant velocity model, the Gaussian- Newton optimization method is adopted to achieve precise and efficient pose tracking. The UAV autonomous landing experiments demonstrate that the proposed method realize real-time and accurate pose estimation, which successfully guide the UAV landing on the roof of the vehicle.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quanrui Chen, Zhiying Chen, Zhuo Zhang, Liangchao Guo, Xiaoliang Sun
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302K (2024) https://doi.org/10.1117/12.3035597
Traditional vision-guided aircraft autonomous landing technology relies on artificial design features such as cooperative markers. In complex environments, these systems are unreliable. For the above problems, this paper proposes that vision-guided autonomous landing technology for non-cooperative target, which utilizes the geometric features of the target without relying on the cooperative target. The system first extracts 2D key points under the object and key point detection based on deep learning, and then solves the Perspective-n-Point problem to obtain the pose. The aircraft can landing by taking images and calculate the pose steadily. In this paper, a visual guidance system composed of hardware such as UAV and camera is used for experiments. Experiments show that the system can successfully achieve visual guided landing. Pose update frequency is higher than 10 Hz, the average error of key point prediction is less than 2.5 pixels, and the reprojection error is less than 4 pixels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302L (2024) https://doi.org/10.1117/12.3035734
The distance measurement based on binocular stereo vision is suitable for online and non-contact target detection because of its advantages including high efficiency, simple system structure and low cost. However, the measurement error will be introduced if the target is on the water (e.g., river and sea), since the optical path changes with the refractive index of the air above the water. The is because the refractive index varies with the temperature, humidity, and atmospheric pressure. Herein, we analyzed the effect of these factors on the measurement error, and proposed model based on backpropagation neural network to predict the measurement error. The error model established in this work provides guidance for the design of accurate binocular ranging systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302M (2024) https://doi.org/10.1117/12.3035705
In this paper, the measurement performance of a 10 kPa gas piston manometer is analyzed. First, we introduce the working principle and structure of the gas piston manometer. Then, we discuss in detail the accuracy, repeatability, sensitivity, and stability of the manometer. We found that the 10 kPa gas piston manometer has a high accuracy. The manometer shows good repeatability, that is, the results obtained by multiple measurements under the same conditions are very close. The manometer is very sensitive to pressure changes in a small range and can provide accurate measurement results. The pressure gauge shows good stability, that is, it can maintain relatively stable measurement results during long-term use. The 10 kPa gas piston manometer is suitable for various pressure measurement occasions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Guangxun E, Mengtian Li, Chuanyi Ma, Hanpeng Wang, Shengtao Zhang, Ruijie Zheng, Ning Zhang
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302N (2024) https://doi.org/10.1117/12.3035694
Metering pump is a conventional loading analysis instrument, widely used in metrology, petroleum and chemical industry, with the functions of injection and loading, fluid displacement, metering control, etc. At present, the conventional metering pumps used in China have problems in practical application such as slow loading and pressure boosting, drastic fluctuations in holding pressure, large metering errors and complicated program control operations, which cannot meet the technical function and parameter requirements of cutting-edge scientific research experimental instruments in various fields, such as cyclic high-frequency loading and unloading for rock mechanics testing, and seepage displacement of rock microscopic pores. Based on the above problems, this paper develops a new doublecylinder metering pump with high precision and high sensitivity. The device innovatively uses the top-pressure loading technology of halberd gear rotating nut, which has multiple loading functions such as constant pressure and steady pressure, constant speed loading, controllable pulse and vibration, and can monitor the porosity evolution of rock specimens under the action of cyclic loading and unloading in real time. The device lays a theoretical foundation for studying the safety evolution mechanism of a series of rock engineering such as water conservancy mines and oil exploitation, and improves the scientific basis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302O (2024) https://doi.org/10.1117/12.3035449
Home gas detection systems are a key technology designed to improve indoor air quality and protect the health of family members. Through a series of sensors, data acquisition and user interfaces, the intelligent system is able to monitor the concentration of harmful gases such as carbon monoxide (CO), formaldehyde (HCHO), volatile organic compounds (VOCs) in the household indoor air in real time to ensure the health and safety of residents. The working principle of a home gas detection system is quite simple and efficient. The sensors continuously measure the gas concentration of the indoor air and transmit the data to a central control unit. If any hazardous gas concentration is detected above a preset safety threshold, the system will immediately trigger an alarm and take steps to notify family members and respond to potential hazards in a timely manner. In addition, some systems are networked and can be monitored remotely via a mobile app or cloud platform, allowing users to check indoor air quality at any time. This technology plays an important role in a variety of situations, especially after a new renovation, the use of gas equipment, or any scenario that may trigger the release of harmful gases. The home gas detection system is the guardian of the home environment, providing us with a safe and healthy living space and helping to reduce health problems caused by indoor air pollution. It is part of the future of home intelligence, creating a more comfortable and sustainable living environment for us, and deserves our high attention and adoption.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302P (2024) https://doi.org/10.1117/12.3035441
Compared with L-band radiosonde, we evaluate the applicability of temperature and relative humidity profile data below 2 km height of a kind of new detection equipment, a microwave radiometer, installed at Qingdao National Basic Meteorological Observing Station in April to August 2021 and 2022. The results show that the temperature and relative humidity detected by microwave radiometer and L-band radiosonde respectively have relatively consistent changes at each altitude level. The mean error of relative humidity is within ±5% and both root mean square error and mean absolute error show an increasing trend with height until they become stable above 500 m. The mean error of temperature is within 2.5°C and the temperature of the microwave radiometer is higher than that of the L-band radiosonde. The mean error, root mean square error, and mean absolute error all increase first and then decrease with height with the mean error reaching the peak at about 600 m. In general, the relative humidity and temperature profile data of microwave radiometers possess certain reliability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302Q (2024) https://doi.org/10.1117/12.3036417
Optimizing scene management and reducing rendering data are key to improving efficiency in the virtual real interaction of the metaverse. Using Morton codes to map three-dimensional spatial data into a linear number list, achieving reasonable partitioning of large-scale grids, and making improvements according to the needs of the metaverse. By using a hierarchical construction approach and utilizing the Morton code based BVH for coarse segmentation, combined with KD Tree for further segmentation, efficient scene partitioning is achieved. On this basis, focus on local adaptive removal of key areas to improve the running frame rate of digital twin scenes. Through this series of methods, unnecessary visibility analysis or deep query operations can be reduced while reducing scene state changes in each frame, thereby improving computer rendering efficiency and providing strong support for virtual real interaction in the metaverse.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302R (2024) https://doi.org/10.1117/12.3036414
In the traditional logistics and distribution process, a large number of tasks cannot be completed accurately and in a timely manner due to errors in manual operations, low efficiency, and high requirements for the working environment. The automated sorting system, by introducing advanced technologies such as image recognition, radio frequency identification (RFID), deep learning, and computer vision, can achieve rapid, accurate, and intelligent item classification and sorting. This ultimately improves delivery efficiency, reduces service costs, and minimizes human errors. This article introduces an automated sorting system based on image recognition and robotic arm grasping. In terms of model selection, we experimented with various deep learning models and ultimately chose InceptionV2 as the final model. To improve the model's accuracy, we adjusted relevant parameters and introduced transfer learning to optimize the model. Experimental results demonstrate that the introduction of transfer learning significantly enhances the model's accuracy, performing well on both the training and testing datasets. Finally, we utilized the CoppeliaSim robot simulation platform to realize the robotic arm's grasping and classification of items on a conveyor belt, completing the closed loop of the automated sorting system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302S (2024) https://doi.org/10.1117/12.3036276
Knowledge distillation is a technique that involves transferring knowledge from a teacher network to a student network, enabling the student to benefit from fewer parameters and computations while maintaining a comparable accuracy to the teacher. In this paper, we propose a novel distillation method that incorporates two key modules: Hard Mining Distillation and Multi-scale Feature Distillation. The Hard Mining Distillation module builds upon the vanilla knowledge distillation approach, taking inspiration from Venn diagrams. By introducing this module, we aim to enhance the students' learning by focusing on challenging areas. Moreover, we integrate the Hard Mining Distillation module into the multi-scale feature distillation process to enable more detailed learning. This method is straightforward to implement and lightweight, eliminating the complexities associated with previous studies while ensuring efficient distillation. To evaluate the effectiveness of our proposed method, we conducted quantitative experiments on the BTCV dataset. The experimental results demonstrate that our method outperforms other approaches, achieving the best performance in terms of accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302T (2024) https://doi.org/10.1117/12.3036289
In recent years, with the development of artificial intelligence technology, intelligent robots are more and more widely used in many fields. In this paper, an intelligent patrol wheeled robot based on image recognition is designed and implemented. The robot utilizes Raspberry PI as its core control unit and integrates sensing equipment such as cameras and ultrasonic sensors. The functions of autonomous navigation, intelligent obstacle avoidance and path tracking are improved through computer vision and machine learning technology. The main technologies include OpenCV for image processing, HOG feature extraction and SVM algorithm for traffic sign recognition, and traffic light detection system based on color recognition. In addition, this paper also discusses the use of positional PID control algorithm to achieve visual lane keeping scheme. Through detailed system design and experimental verification, this paper shows the performance benefits of the intelligent robot in practical applications, and provides a new idea for the future research of intelligent robots. It is expected that the robot has a wide range of application potential in intelligent transportation, public safety and other fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302U (2024) https://doi.org/10.1117/12.3036525
To enhance the precision and speed of path planning and autonomous navigation for mobile robots in unstructured environments, this paper proposes a method involving multi-sensor data acquisition, fusion, and the local terrain map construction. Utilizing the robot's onboard sensing system, the method achieves real-time elevation map construction and immediate updates centered around the robot's current position during its movement. A sliding window is employed to manage the memory of the map, ensuring real-time performance and robustness. The paper analyzes the influence of error propagation in robot state estimation on the construction of elevation maps and compensates for errors during incremental map updates, addressing the map consistency issue arising from odometry drift. Employing a probability terrain estimation method based on discrete Bayesian filters effectively accumulates measurement values, thus obtaining estimates of occupancy probability in the local environment. This method analyzes and models ground roughness, obstacle distribution, and passable areas in the robot's surroundings, providing valuable prior information for local path planning and obstacle avoidance. Experimental results validate the effectiveness and reliability of the proposed method using real-world data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302V (2024) https://doi.org/10.1117/12.3036521
The conventional method of integrating operation and maintenance data for power information systems is often time-consuming and suffers from limited data coverage. Consequently, this paper proposes a new method for integrating such data using artificial intelligence technologies. Initially, we set a unified standard and classification guideline for the integration process of operation and maintenance data in power information systems, identifying data types based on their interrelationships. We then organize and process the operation and maintenance data, removing any inconsistencies and redundancies to minimize the impact of irrelevant data and enhance integration efficiency. Furthermore, we implement a standardized coding system to verify and encode the organized data, ensuring it corresponds accurately with the actual datasets. By leveraging deep learning algorithms on offline data, this approach not only minimizes system resource utilization but also increases the extent of effective data coverage. Experimental outcomes indicate that this integration approach not only accelerates the process but also achieves greater data inclusivity compared to traditional methods, thus offering significant practical value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302W (2024) https://doi.org/10.1117/12.3036554
This article introduces an automatic data acquisition system for the calibration device of the inclinometer using a circular grating encoder as the standard device. The system utilizes LabVIEW to develop a complete host computer system and employs vision modules to process, segment, and recognize the captured images of the inclinometer for indication. Through this system, data acquisition for inclinometers and angular encoders can be achieved, while automating the calibration process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302X (2024) https://doi.org/10.1117/12.3036569
Hypergraph neural networks have demonstrated outstanding performance in various fields. However, there is still a relative lack of research on the security aspects of hypergraph neural networks, particularly in terms of adversarial attack methods, when compared to graph neural networks. The existence of this research gap poses a potential threat to the security of deep learning applications that use hypergraph neural networks. To enhance the reliability and security of hypergraph neural networks, we propose HyperMGA, an adversarial attack method specifically designed for hypergraphs. HyperMGA utilizes gradient momentum to effectively update the adversarial network and produce robust attack results. In comparative evaluations, HyperMGA outperforms state-of-the-art methods including integrated gradient algorithms, hypergraph attacks, fast gradient attacks, and stochastic attacks by a factor of 2 to 2.8 in terms of attack success rate. Additionally, HyperMGA’s runtime is comparable to that of FGA, which is only 36% of that of IGA. The objective of this research is to enhance the security of hypergraph neural networks against attack methods in various applications. This will be achieved by bridging the existing gap in knowledge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 132302Y (2024) https://doi.org/10.1117/12.3036625
The detection technology for floating debris plays a vital role in environmental protection tasks. This paper addresses challenges including surface reflection interference, shore background interference, and insufficient accuracy in recognizing small objects. To figure out these issues, an enhanced GST-YOLOv8 object detection method is proposed to improve detection accuracy in this paper. The improvement is achieved by introducing a global attention layer in the backbone network to facilitate the concentration of critical image information, enhancing processing efficiency. Additionally, a small object perception layer is incorporated into the neck network to enhance the perception of shallow-level information during detection. This perception layer leads to improved accuracy in detecting small objects. Comparative experiments and ablation experiments conducted on a dedicated public dataset illustrate the efficacy of the enhanced method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.