PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12749, including the Title Page, Copyright information, Table of Contents and Conference Committee list.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sixteenth International Conference on Quality Control by Artificial Vision
In recent years, various methods have been proposed for reconstructing the 3D shape of an object from a single view image. While methods that reconstruct the object as a single model show promising results, they often lack part-level details. On the other hand, part-level reconstruction methods provide recognition of parts but struggle to represent detailed shapes due to the use of a single primitive. To address this issue, this paper proposes a Compositionally Generalizable 3D Structure Prediction Network using Multiple Types of Primitives (CompNet-MTP). CompNet-MTP first estimates the parameters of each type of primitive for every part and then selects the appropriate primitive type to construct the 3D shape of the object. In the experiments, we used cylinders in addition to cuboids, which are commonly used as primitive shapes. Experimental results confirm the effectiveness of the proposed network in handling multiple types of primitives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to maintain a healthy ecosystem and fish stocks, it is necessary to monitor the abundance and frequency of fish species. In this article, we propose a fish detection and classification system. In the first step, the images were extracted from a public Ocqueoc River DIDSON high-resolution imaging sonar dataset and annotated. End-to-end object detection models, Detection Transformer with a ResNet-50 backbone (DETR-ResNet-50) and YOLOv7 were used to detect and classify fish species. With a mean average precision of 0.79, YOLOv7 outperformed DETR-ResNet-50. The results demonstrated that the proposed system can in fact be used to detect and classify fish species using high-resolution imaging sonar data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ranging the crates of grapes using a robust quality index is a major tool for operators during the Champagne grape harvest. We propose building such an index by processing RGB images of crates of grapes. Each image is segmented into six classes such as healthy grape, crate, diseases (grey rot, powdery mildew, conidia), green elements (stalk, leaf, unripe healthy grape), shadow, dry elements (dry leaf, dry grape, wood) and the index of quality reflects the proportion of healthy part inside the crate. As the main pretreatment, the segmentation must be carefully performed, and a random forest-based solution for each variety of grape is proposed here whose training is done on hand-tagged pixels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Producing Near Net Shape parts with complex geometries using Wire-Laser Additive Manufacturing often requires a mastered and optimized process. Differences between the constructed and nominal geometries of the manufactured entities demand an in-situ defects measurement to complete the production of the entire part successfully. A contactless measuring system is needed to evaluate geometrical deviations without requiring complex post-processing operations. To overcome this challenge and validate a measuring tool that serves the manufacturing purpose, a global stereocorrelation approach is used to measure defects in wire-laser additively manufactured parts. This method relies on the cameras’ self-calibration phase that uses the part substrate’s nominal model. Then a modal basis is defined to model and evaluate the surface dimensional and shape defects. Hence, an analysis of the texture obtained in additive manufacturing is conducted to assess whether or not it is sufficient for image correlation and defects measurement. Finally, natural and pattern textures are compared to highlight their influence on the measurement results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Anomaly detection in an unsupervised manner has become the go-to approach in applications where data labeling proves problematic. However, these approaches aren’t completely unsupervised, since they rely on the weak knowledge of the dataset distribution into anomalous and anomaly-free subsets and typically require post-training threshold calibration in order to perform anomaly detection. Yet, they do not take advantage of available positive samples during training. In contrast, fully supervised approaches have proven to be more accurate and more efficient, however, they require a sufficient number of anomalous images to be labeled on a per-pixel level, which represents a labour-intensive task. In this paper, we propose a new hybrid approach that utilizes the best of both worlds. We use an unsupervised approach to build a model for generating pseudo labels, followed by a supervised approach in order to robustify anomaly detection. Moreover, we extend this approach with an active learning schema, that results in learning with mixed supervision. We achieve several improvements, i.e., the utilization of available positive image samples, improved anomaly detection performance, and the retention of real-time performance. The proposed approach yields results that are comparable to the fully supervised approach, and at the very least, reduces the number of required labeled anomalous samples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two typical instruments can be employed for linear polarization imaging: a rotating polarizer in front of a classical monochrome camera (division of time), or a dedicated sensor with a polarization filter array (division of focal-plane). The last method enables the snapshot acquisition of the linear polarization properties of the light with a compact and affordable instrument. The rotating polarizer method has until now been preferred when good polarimetric precision is required. It is still unclear how these two techniques perform comparatively in terms of polarimetric accuracy. This paper provides a practical comparison between the two methods, and evaluates the effect of pre-processing applied on raw images to counterbalance the differences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to develop a new device for automatic quality control of grapes stored in crates just before pressing, it is necessary to specify many parameters. Among these, lighting is particularly important, both for the recognition methods and for the control system physical design and cost. This study introduces a database of images of grapes in crates, created specifically for the study, and investigates the possibility of recognizing healthy grapes from other visible elements (diseases, leaves. . . ) with four different lighting conditions and two classifiers (SVM and CNN). The experimental results show the feasibility of the system and provide objective and quantified elements to guide its design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electrical cables consist of numerous wires. The 3D shape of individual wires in the cables significantly affects their characteristics. With the increasing diversity of wire structures, understanding the 3D shape of each wire in an electrical cable is crucial for analyzing characteristics such as bending stiffness. To accurately estimate the bending stiffness of actual electrical cables, a detailed 3D representation of each wire is required. Therefore, it is important to obtain the 3D shape of actual electrical cable wires using non-destructive inspection. In this study, we propose a new method to associate wires using particle tracking techniques. The proposed method performs wire tracking in two steps, linking detected wire positions between adjacent frames and connecting segmented wires across frames. The effectiveness of the proposed method was evaluated quantitatively and qualitatively. In the quantitative evaluation, the correctness of the tracking was evaluated using the data with artificial noise added to the real data detection results. The proposed method achieves highly accurate wire tracking when there is little undetected noise, and shows robustness to over-detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the competitive world of the metal industry where companies have to offer quality products, quality control is crucial. However, it takes a considerable amount of time, especially if it is a manual process. Automatic Fault Detection (AFD) system reduces a lot of work for the companies, saves time, money and improves use of available resources. Deep learning can be efficiently used to develop such a AFD system. In this article, we present the development of deep learning (DL) algorithms for quality control. We trained State-of the-art DL (YOLO v8n, YOLO v8s, YOLO v8m, YOLO v8l and YOLO v8x) for a quality control task using a manually annotated dataset of 3 classes (neck scratch, scratch and bent) for 2 objects (Screw and Metal Nut). The results show very interesting scores for YOLO v8s with an mAP@0.50 of 90.60%, a precision of 100% and a recall of 94.0% for the 3 classes on average. We also compared the performance of these models with a popular DL model detector called Faster-RCNN x101 in order to confirm the performance of the developed models. The qualitative results show good detection of defects with different sizes (small, medium and large). Our proposition gives very interesting results to deploy an AFD system for metal industries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Anomaly detection is an essential task within an industry domain, and sophisticated approaches have been proposed. PaDiM has a promising direction, utilizing ImageNet-pretrained convolutional neural networks without expensive training costs. However, the cues and biases utilized by PaDiM, i.e., shape-vs-texture bias in an anomaly detection process, are unclear. To reveal the bias, we proposed to apply frequency analysis to PaDiM. For frequency analysis, we use a Fourier Heat Map that investigates the sensitivity of the anomaly detection model to input noise in the frequency domain. As a result, we found that PaDiM utilizes texture information as a cue for anomaly detection, similar to the classification models. Based on this preliminary experiment, we propose a shape-aware Stylized PaDiM. Our model is a PaDiM that uses pre-trained weights learned on Stylized ImageNet instead of ImageNet. In the experiments, we confirmed that Stylized PaDiM improves the robustness of high-frequency perturbations. Stylized PaDiM also achieved higher performance than PaDiM for anomaly detection in clean images of MVTecAD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this contribution, we explore a machine learning approach for the concrete structure inspection using both surface and sub-surface imaging. For this purpose, we first propose and evaluate a deep learning based approach for the segmentation of rebar instances from ground penetrating radar images. The performance of a mask-R-CNN-based model show that the average precision is higher than 85% for reinforcement bar segmentation. We also evaluate the generalization capabilities of the model. In a second step, different criteria (reinforcement bars location and their normalized magnitudes) are computed from the extracted mask. These criteria are analysed in relation to the images of the structure surface that had been classified either in a healthy or damaged category (i.e. with cracks).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For semiconductor applications, billions of objects are manufactured for a single device such as a central processing unit (CPU), storage drive, or graphical processing unit (GPU). To obtain functional devices, each element of the device has to follow precise dimensional and physical specifications at the nanoscale. Generally, the pipeline consists to annotate an object in an image and then take the measurements of the object. Manually annotating images is extremely time-consuming. In this paper, we propose a robust and fast semi-automatic method to annotate an object in a microscopy image. The approach is a deep learning contour-based method able first to detect the object and after finding the contour thanks to a constraint loss function. This constraint follows the physical meaning of electron microscopy images. It improves the quality of boundary detail of the vertices of each object by matching the predicted vertices and most likely the contour. The loss is computed during training for each object using a proximal way of our dataset. The approach was tested on 3 different types of datasets. The experiments showed that our approaches can achieve state-of-the-art performance on several microscopy images dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we present our work in which we seek to measure the relief of a liquid film driven by the rotation of a roller. This object of study presents both theoretical and practical interest. Indeed, this coating flow approaches the Landau-Levich-Deryaguin theory (Newtonian fluid film driven by a flat plate)1 . This free surface measurement also presents real scientific and technical constraints that conditions the measurement method to be used. While the fineness of the fluid film thickness at low rotation speed requires precise measurement, the dynamics of the free surface at high rotation speed requires fast measurement and the use of a minimal number of images for estimation. Several methods have been developed and applied on an opaque fluid using the principles of fringe projection profilometry. A shadowgraphy method was employed to validate the obtained results. Although used in a specific case, the method employed has potential for various applications where a free surface measurement is required.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The unique properties of Terahertz (THz) radiation include among others the ability to penetrate through electrical insulators such as ceramics, plastics, or plastic composites. Because of that, it is possible to non-destructively and contact free analyze the materials with internal cavities both in transmission and reflection configuration. The commercially available low-power sources provide results which quality is still beyond expectations. As a result, efforts are being made to resolve the studies' focused on optimizing the experimental setup. In the presented work the comparison between two experimental setups operated at the frequencies of 100 GHz and 300 GHz was described. The studies were performed in transmission mode on selected composite material. They were next compared to the results obtained using common pulsed thermography. Some practical application in non-destructive testing and possible improvements of described methods are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper aims to describe a combined machine vision and deep learning method for quality control in an industrial environment. The innovative approach used for the proposed solution leverages the use of low-cost hardware of reduced size, and yields extremely high evaluation accuracy and limited computational time. As a result, the developed system works entirely on a portable smart camera. It does not require additional sensors, such as photocells, nor is it based on external computation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subtle changes in emotional expressions occur more frequently compared to rich ones, which makes the evaluation of the emotional response of an individual challenging. In this study, we focus on the near-expressionless facial images, indicated with low arousal and valence value. We investigated the facial landmarks which are crucial in estimating subtle emotion through a novel feature selection method named Random Combination Selection with Iterative Step (RACSIS)1 . By combining appearance and geometrical features, while reducing the feature points up to 93.8%, the Mean Absolute Error (MAE) for Arousal = [-4 8], Valence = [-7 6], was reduced to 54.95% and 46.39% for the full emotional spectrum and the subtle emotion, respectively. We then tested the performance of the RACSIS to estimate the emotional response of participants undertaking audio-visual activities. We conclude that: 1. Appearance features played a greater role in reducing the MAE. 2. Feature selection (FS) by RACSIS achieved lower MAE values compared to correlation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
After more than three decades of research in robot manipulation problems, we observed a considerable level of maturity in different related problems. Many high-performant objects pose tracking exists, one of the main problems for these methods is the robustness again occlusion during in-hand manipulation. This work presents a new multimodal perception approach in order to estimate the pose of an object during an in-hand manipulation. Here, we propose a novel learning-based approach to recover the pose of an object in hand by using a regression method. Particularly, we fuse the visual-based tactile information and depth visual information in order to overpass occlusion problems commonly presented during robot manipulation tasks. Our method is trained and evaluated using simulation. We compare the proposed method against different state-of-the-art approaches to show its robustness in hard scenarios. The recovered results show a reliable increment in performance, while they are obtained using a benchmark in order to obtain replicable and comparable results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
RGB-D 6D pose estimation has recently gained significant research attention due to the complementary information provided by depth data. However, in real-world scenarios, especially in industrial applications, the depth and color images are often more noisy1 . 2 Existing methods typically employ fusion designs that equally average RGB and depth features, which may not be optimal. In this paper, we propose a novel fusion design that adaptively merges RGB-D cues. Our approach involves assigning two learnable weight α1 and α2 to adjust the RGB and depth contributions with respect to the network depth. This enables us to improve the robustness against low-quality depth input in a simple yet effective manner. We conducted extensive experiments on the 6D pose estimation benchmark and demonstrated the effectiveness of our method. We evaluated our network in conjunction with DenseFusion on two datasets (LineMod3 and YCB4) using similar noise scenarios to verify the usefulness of reinforcing the fusion with the α1 and α2 parameters. Our experiments show that our method outperforms existing methods, particularly in low-quality depth input scenarios. We plan to make our source code publicly available for future research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Annotation is a labor-intensive task in deep learning, which requires large amounts of training data. In active learning, which reduces the annotation work, the performance of the model is improved without annotating all the data by performing annotation step by step. In this study, we propose a method to incorporate a curriculum learning framework into active learning, which improves the performance of the model by learning from samples that are easy to identify. The experimental results show that the proposed method achieves 20% reduction in the total annotations compared to random sampling on CIFAR-10.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Since 2020 in the USA1 and 2021 in Europe, all medical devices have to be marked with a Unique Device Identification (UDI) code to ensure their traceability. UDI codes are laser marked but the engraving process is error-prone due laser-related or external conditions. Defects may be assessed visually but this process is costly and gives rise to human errors. Using machine vision to perform this task for large batches of UDI codes may be challenging due to alterations in readability caused by marking defects or image quality. Therefore, we have tested several learned methods to achieve two goals: correctly recognize characters and identifying marking defects on UDI codes. As the codes were engraved on cylindrical metallic surfaces with a metallic paint effect, we had to address the problem of specular and stray reflections through the development of a tailor-made lighting engine. Our image grabbing and processing pipeline comprises of an imaging device designed to prevent reflections onto engraved codes; an Optical Character Recognition (OCR) algorithm (multilayer perceptron, support vector machine, classical image segmentation), and a probabilistic model to detect faulty characters that need to be further qualified by a human operator. Our results show that multilayer perceptron (MLP) and support vector machine (SVM) recognition performances are very close together and above classical image segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes the Sparse Matrix Deep Compressed Sensing (SM-DCS) that leverages on compressive sensing and deep learning techniques for 3D X-ray Microscopy (XRM) based applications. It enables up to 85% reduction in the number of pixels to be measured while maintaining reasonable accurate image quality. Unlike other direct compressed sensing approaches, SM-DCS can be applied using existing measurement equipment. SM-DCS works by measuring a subset of the image pixels followed by performing compressed sensing recovery process to recover each image slice. Experimental results demonstrate that SM-DCS produces reconstruction images that are comparable to direct compressed sensing measurement approach on various performance metrics, but without the need to change the existing equipment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce in this paper a novel approach for the design of buckling tests for which the deformation of the sampled is studied using one single camera mounted on a six-axis robot arm. This setup allows to pre-determine multiple configurations of deformation of the sample using a virtual model of the experiment. The question of the calibration of the digital camera parameters is also studied, introducing a 3D target containing CharUco boards. Early results are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have researched a hierarchical lossless encoding method using cellular neural networks (CNN) as predictors. In our method, which belongs to the hierarchical lossless coding method, the prediction accuracy is improved by adaptively using different CNN predictors depending on the direction of the image edges. The prediction error obtained by CNN prediction is encoded by adaptive arithmetic coding using multiple probabilistic models based on the context modeling. In previous research,1 a new approach is introduced in which the prediction errors of each predictor are encoded separately by arithmetic coding. Although this method improves the performance of encoding prediction errors, increasing side information became an issue. Therefore, to reduce the side information of the arithmetic coders, we propose a grouping algorithm that groups the prediction errors corresponding to each predictor based on the utilization of the predictors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detailed identification of visual impressions of objects by attributes can be leveraged to develop products and improve customer satisfaction. In this study, we propose a method to estimate Kansei (affective) information for each attribute, which is the visual impression received from the image. For each attribute, we created a dataset with Kansei indices. By fine-tuning the created dataset to combine attribute information with the output of ResNet18 which was already trained with ImageNet to output indexes, we confirmed that the correlation coefficients for multiple item ratings were higher than those of a deep learning model without attribute information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This contribution presents a novel method to extract skin physical parameters as geometry, colour and gloss with photometric stereo. Our method is based on QNN (Quaternion Neural Network) to estimate the surface geometry from images with a fixed viewpoint modifying surface illumination, i.e. photometric stereo. To that end, we assume that surface BRDF (Bidirectional Reflectance Distribution Function) can be separated by a diffuse and specular component. Once the geometry is estimated, colour is estimated from geometry to finally compute gloss. This method results on multiple gloss maps which are used to compute features that characterise surface gloss. Unlike other approaches, our method does not require polarising filters that suffer from a more complex light modelling. We demonstrate the effectiveness of our approach through experiments on rendering, cow leather and ex-vivo skin samples. The proposed method has potential for various real-world applications such as evaluating the appearance of skin care products or assessing skin health.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computerized Maintenance Management System (CMMS) are to assist in organising maintenance, both proactive and reactive, as well as technical operations. It usually works alongside constant surveillance and monitoring of said equipment through repetitive and time-consuming tasks. AI can ease maintenance activities by reducing the time spent on these repetitive task and allocate more time on decision. In this article we present our works on automating part of the intervention request handling in Berger-Levrault’s CMMs. We designed a pipeline of computer vision operations to predict the type of intervention needed from a picture of the situation at hand. The pipeline is basically a decision tree which combines different computer vision models and funnel images between them according to their respective outputs. Each of these models are trained separately on a specific task. To validate our approach, we performed a topic modeling analysis on the maintenance request forms to identify the ten most common topics of intervention. We show that our pipeline performs better than direct prediction by scene recognition model with a five points increases in global F1 score (40% / 45%) which is even more true for the classes with fewer training examples (23% / 37%).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose XRANet, a Deep Convolutional Neural Network (DNN) architecture for Semantic Segmentation. The recent advancements in deep learning and convolutional neural networks have greatly improved the accuracy of segmentation tasks. XRANet builds on the widely used U-Net architecture and adds several improvements to increase performance. The eXtra-wide mechanism in the encoder, combined with residual connections and an attention mechanism in both the encoder and decoder, enhances feature extraction and reduces the activation of pixels outside the regions of interest. The proposed architecture was evaluated on various public datasets, and the results were measured using the dice coefficient metric, obtaining promising quantitavive and qualitative results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Standard imaging techniques do not get as much information from a scene as light-field imaging. Light-field (LF) cameras can measure the light intensity reflected by an object and, most importantly, the direction of its light rays. This information can be used in different applications, such as depth estimation, in-plane focusing, creating full-focused images, etc. However, standard key-point detectors often employed in computer vision applications cannot be applied directly to plenoptic images due to the nature of raw LF images. This work presents an approach for key-point detection dedicated to plenoptic images. Our method allows using of conventional key-point detector methods. It forces the detection of this key-point in a set of micro-images of the raw LF image. Obtaining this important number of key-points is essential for applications that require finding additional correspondences in the raw space, such as disparity estimation, indirect visual odometry techniques, and others. The approach is set to the test by modifying the Harris key-point detector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have been developing a helmet-type spatial perception support system that provides the user with vibration stimuli in response to the distance variation to an obstacle. The purpose of this research is set to propose a method for generating vibration stimuli that appropriately represent two environmental elements, walls and apertures, and verify its effectiveness for the aperture passage perception. The five vibro-motors are positioned at directional angles of 0 degree in front, 30 and 60 degrees to left and right, and generate vibration stimuli of an intensity calculated by assigning appropriate damping weights. We set the distance-dependent damping weights separately for each directional angle in the calculation of the vibration intensity to be generated for each motor. Experimental results demonstrate that the subjects were able to pass through the aperture in approximately 91 % of trials. This suggests that the developed system and the proposed vibration stimuli generation method are effective in perceiving space from the vibration stimuli provided to the head.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent major developments in the understanding of human social interactions have greatly contributed to the development of computers with social interaction capabilities. Many studies have investigated the understanding of human interaction from cameras. Identification of people across multiple videos is important for exploring human social interactions in group activities. We propose a framework for person segmentation and identification across videos captured by multiple wearable cameras. The proposed method comprises a local tracking module for tracking people in a single video and a global matching module for matching people across multiple videos. The method uses global consistency to identify people across multiple videos as well as ensures spatial-temporal consistency in a single video. We have demonstrated the effectiveness of our proposed method in comparison with a baseline method by using public datasets and our own dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vibration study, also called modal analysis, plays an important role in the structural health monitoring of mechanical structures. During the last decade, video-based modal analysis methods have emerged to provide dense vibration estimation using each pixel as a contactless sensor. Dense subpixel motion is estimated and is then processed by a modal analysis algorithm to extract the modal basis composed of natural frequencies, damping ratios, and mode shapes. This paper introduces a new single-subband phase-based method for subpixel motion estimation. It is compared with state-of-the-art motion estimation method on synthetic and experimental videos of a cantilever beam.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Non-destructive testing (NDT) is employed by companies to assess the features of a material, in order to identify some variations or anomalies in its properties without causing any damage to the original object. In this context of industrial visual inspection, the help of new technologies and especially deep supervised learning is nowadays required to reach a very high level of performance. Data labelling, that is essential to reach such performance, may be fastidious and tricky, and only experts can provide the labelling of the material possible defects. Considering classification problems, this paper addresses the issue of handling noisy labels in datasets. We will first present the existing works related to the problem, our general idea of how to handle it, then we will present our proposed method in detail along with the obtained results that reach more than 0.96 and 0.88 of accuracy for noisified MNIST and CIFAR-10 respectively with a 40% noise ratio. Finally, we present some potential perspectives for future works.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are focused on conformity control of complex aeronautical mechanical assemblies, typically an aircraft engine at the end or in the middle of the assembly process. Our overall system should ensure that all the mechanical parts are present and well-mounted. A 3D scanner carried by a robot arm provides acquisitions of 3D point clouds which are further processed. Computer-Aided Design (CAD) model of the mechanical assembly is available. In this paper, we are concentrating on detecting the absence of mechanical elements. Previously we have developed a rendering pipeline for creating realistic synthetic 3D point cloud data. We do this by using the CAD model and taking into account occlusion and self-occlusion of mechanical parts. In this paper, an existing deep neural network for 3D segmentation is experimentally chosen and trained on these synthetic data. Further, the model is evaluated on real data acquired by a 3D scanner and has shown good quantitative results according to a segmentation metric. Finally, when a threshold is applied to the segmentation result, a final decision is made on the absence/presence problem. The achieved accuracy is 98.7%. Our research work is being carried out within the framework of the joint research laboratory ”Inspection 4.0” between IMT Mines Albi/ICA and the company Diota specialized in the development of numerical tools for Industry 4.0. This research is a continuation of the work presented at the QCAV’2021 conference [1].
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Structured illumination combined with a monocular 3D camera leveraging on the estimation of the defocus blur has been proposed in the literature for industrial surface inspection. The accuracy of such active depth from defocus (ADFD) system depends on the camera/projector and processing parameters. Here, we propose to optimize the settings of an ADFD system, using a performance model that can predict the theoretical depth estimation accuracy for a given set of optical/projector/processing parameters. The accuracy of the optimized system is then experimentally evaluated. Moreover, we provide experimental results on real objects, including metallic parts, compared to a reference depth map obtained with an active stereoscopic camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Structural health monitoring (SHM) is a crucial process that enables the diagnosis of the health state of civil and industrial smart structures through autonomous and in-situ non-destructive measurements. The focus of our study is on the damage classification step within the aeronautic context, where the primary objective is to distinguish between different damage types in composite plates. To achieve this, we considered three experimental damages - impact, delamination, and magnet - on an aeronautic composite plate embedded with a piezoelectric array and excited it using ultrasonic guided Lamb waves. We recorded signals resulting from pristine and damaged states and used three methods to create images from the raw recorded data. These methods employed Damage Indexes (DI) that compare signals in the healthy and damaged states for each actuator/sensor path. For the first two methods, images were directly created as pixel maps depicting DI distribution according to the actuator/receiver pairs over the plate. The last method applied the classical RAPID damage localization algorithm, generating damage localization maps associated with a given DI. The datasets generated by the two methods were fed into a Convolutional Neural Network (CNN) for damage classification purposes. Our study demonstrated that the best accuracy for the introduced methods was above 92% for different hyperparameters configurations, indicating their ability to perform the desired SHM damage classification task. The DI-based approach was much more efficient than the RAPID-based method, which was not intuitively expected. These findings contribute to the development of effective SHM techniques for aeronautic composite plates, paving the way for further improvements in this critical field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a solution for the problem of visual mechanical assembly inspection by processing point cloud data acquired via a 3D scanner. The approach is based on deep Siamese neural networks for 3D point clouds. To overcome the requirement for a large amount of labeled training data, only synthetically generated data is used for training and validation. Real-acquired point clouds are used only in testing phase.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the results of applying optimization techniques, most notably neural architecture search (NAS) and hyperparameter optimization (HPO) strategies, to a known state-of-the-art deep learning model for surface defect detection in industry. It will be shown that it is possible to achieve a significant reduction in model latency and its number of parameters, while incurring only a negligible drop in accuracy. The main motivation for this was deployment of surface defect detection models on edge devices with very limited computational capabilities, e.g. a Raspberry Pi. Such deployment requirements are becoming more and more ubiquitous, as it is very expensive to install and maintain many high-end machines in industrial environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we propose a new framework to perform visual simultaneous localization and mapping (SLAM) with RGB images artificially generated from thermal images in low light environments where an optical camera cannot be applied. We applied contrastive unpaired translation (CUT) and enhanced generative adversarial network for super-resolution (ESRGAN), which are image translation methods to generate a clear realistic RGB image from a thermal image. Oriented FAST and rotated BRIEF (ORB)-SLAM was performed using the super-resolution fake RGB image to generate a 3D point cloud. Experimental results showed that our thermography-based visual SLAM could generate a 3D temperature distribution map in the low light environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes underwater simultaneous localization and mapping (SLAM) with 3D reconstruction by applying YOLOv7 to acoustic images. In underwater exploration, acoustic cameras, which are called the next generation of ultrasonic sensors, are gradually being applied, and underwater SLAM technologies based on 3D reconstruction with acoustic cameras have been proposed. However, many limitations remain in the accuracy of maps. In this study, we propose a novel approach to improve SLAM accuracy by applying detection results from YOLOv7 in acoustic images to the 3D reconstruction. We utilized the detected objects by YOLOv7 as feature information applied to iterative closest point (ICP)-based SLAM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Being able to identify defects is an essential step during manufacturing processes. Yet, not all defects are necessarily known and sufficiently well described in the databases images. The challenge we address in this paper is to detect any defect by fitting a model using only normal samples of industrial parts. For this purpose, we propose to test fast AnoGAN (f-AnoGAN) approach based on a generative adversarial network (GAN). The method is an unsupervised learning algorithm, that contains two phases; first, we train a generative model using only normal images, which proposes a fast mapping of new data into the latent space. Second, we add and train an encoder to reconstruct images. The anomaly detection is defined by the reconstruction error between the defected data and the reconstructed ones, and the residual error of the discriminator. For our experiments, we use two sets of industrial data; the MVTec Anomaly Detection Dataset and a private dataset which is based on thermal-wave and used for non-destructive testing. This technique has been utilized in research for the evaluation of industrial materials. Applying the f-AnoGAN in this domain offers high anomaly detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate detection of spheres in images holds significant value for photometric 3D vision techniques such as photometric stereo.1 These techniques require precise calibration of lighting, and sphere detection can help in the calibration process. Our proposed approach involves training neural networks to automatically detect spheres of three different material classes: matte, shiny and chrome. We get fast and accurate segmentation of spheres in images, outperforming manual segmentation in terms of speed while maintaining comparable accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Attribution and authentication of paintings are difficult tasks, often based on human expertise. In this work, we present SpectrumArt: a new dataset of multispectral (13 channels) image patches of paintings acquired at very high resolution (800 pixels per mm2 ). We train deep neural networks on SpectrumArt for attribution (i.e., authorship classification) and authentication (i.e., whether of undisputed origin). For attribution, we obtain an accuracy of 92% on a test set of patches coming from unseen paintings. We also propose two classification metrics for attribution of full paintings based on the prediction for the patches: majority vote and entropy weighted vote. Both metrics lead to an attribution score of 100% on unseen paintings. For authenticity testing, our model agrees with the experts’ conclusions on genuine and fake paintings, and provides new insights into the authenticity of paintings where the expert community is divided by proposing a spectral matching score between the painting and an artist. To validate the important advantage of our data collection method, we show that the use of 13 channels instead of 3 and the high resolution of the data significantly improve the accuracy of our models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a pipeline to recover precisely the geometry of a convex polyhedral object from multiple views under circular motion. It is based on the extraction of visible polyhedron vertices from silhouette images and matching across a sequence of images. Compared to standard structure-from-motion pipelines, the method is well suited to the 3D-reconstruction of low-textured and non-Lambertian materials. Experiments on synthetic and real datasets show the efficacy of the proposed framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
First time spectators of fencing competitions cannot understand the complicated rules, making it difficult for them to enjoy the game. Therefore, in this paper, we propose a system that detects the situation of a fencing match using skeleton points extracted from videos. Players cannot be equipped with sensors or other devices to prevent interference with the match. Consequently, this research proposes a system that detects "phrases" using skeleton point information extracted from videos and displays the game situation. We evaluate actual videos of fencing to confirm the performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
“Flying spot” laser infrared thermography (FST) is a non destructive testing technique able to detect small defects by scanning surfaces with a laser heat source. Defects, such as cracks on metallic parts, are revealed by the disturbance of heat propagation measured by an infrared camera. Deep learning approaches are now very efficient to automatically analyse and use contextual information from data, and can be used for crack detection. However, in the literature only few works deal with the use of deep learning for the crack detection in FST. Indeed obtaining a large amount of data from FST examinations can be expensive and time-consuming. We propose here to build a generic, open-access dataset of laser thermography for defect detection. This database can be used by the community to develop new crack detection methods that can be benchmarked on the same database, as well as for pretraining networks for similar application tasks. We also present results of state of the art detection networks trained with the proposed database. These models give a basis for future works. Dataset, called FLYD (FLYing spot thermography Dataset), will be available in : https://github.com/kevinhelvig/FLYD/.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper deals with the detection and characterization of surface damages (a dent, a crack, etc.) on mechanical surfaces using 2D/3D vision (3D scanner and/or 2D RGB camera). The main innovative aspect lies in the exploitation of the Computer Aided Design model, when it is available, with two possible scenarios: ”manual control” via a hand-held 3D scanner carried by an operator, or ”automated control” via a 3D scanner carried by a cobot. This research work has been carried out within the joint research laboratory ”Inspection 4.0” between IMT Mines Albi/ICA and the DIOTA company, specialized in the development of numerical tools for Industry 4.0
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Thermography is a highly beneficial non-invasive and non-contact tool that finds applications in various fields, such as building inspection, industrial equipment monitoring, quality control, and medical evaluations. Analyzing the surface temperature of an object at different points in time, and under varying conditions, can help detect defects, cracks, and anomalies in industry components. In this study, we propose a framework for reproducible and quantitative measurement of surface temperature changes over time using thermal 3D models created with low-cost and portable devices. We present the application of this framework in two cases: to analyze temperature changes over time in a plastic container and to analyze temperature changes before and after medical treatment of a chronic wound. The results on a plastic container and on a chronic wound, show that our approach for multi-temporal registration of thermal 3D models could be a cost-effective and practical solution for studying temperature changes in various applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.