This paper aims to address the limitations of traditional methods in adaptability and reliability by enhancing the precision of goods counting in logistics warehousing environments. Recent advancements in deep learning offer promising solutions to these challenges. Leveraging visual technology, our study proposes enhancements to the A Low-Shot Object Counting Network With Iterative Prototype Adaptation (LOCA) counting network. Specifically, we introduce the aspect ratio as a new feature for extraction, which enhances the model’s capability to capture object characteristics effectively. In the image feature extraction module, vision mamba is introduced to compress visual representations using a bidirectional state space model(SSM). Furthermore, we augment the loss function by integrating the Structural Similarity Index (SSIM) alongside the existing Mean Squared Error (MSE) loss. This augmentation enables the model to maintain pixel-level accuracy while preserving crucial structural information within the images. The experimental results on the test set demonstrate a significant enhancement in Mean Absolute Error (MAE) metrics(over 30%) and Root of Mean Squared Error(RMSE)(over 30%) , thereby validating the effectiveness and generalization capability of the enhanced model. Notably, the introduction of vision mamba, aspect ratios and the SSIM loss function contributes to the model’s improved performance, facilitating more accurate and reliable goods counting. The dataset used in this study originates from real-world warehouse environments, comprising over 1000 annotated images. These annotations encompass two types: points and bounding boxes, which play a crucial role in the development of few-shot counting models. By integrating these innovative features and loss functions, the enhanced model offers a more accurate counting solution for warehouse goods, showcasing potential applications in achieving high-precision inventory audits.
|