Paper
7 December 2023 Heuristic deep reinforcement learning for online packing
Chengbo Yang, Yonggui Lü, Jing Du, Ting Liu, Ronggui Dao
Author Affiliations +
Proceedings Volume 12941, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023); 1294158 (2023) https://doi.org/10.1117/12.3011479
Event: Third International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 203), 2023, Yinchuan, China
Abstract
In order to improve the utilization rate of loading space, a model of online packing problem was established. In this paper, we introduce an advanced technique for incorporating heuristic knowledge distillation into reinforcement learning agents. Our approach utilizes a two-stage learning process, where a teacher model, trained with heuristics, guides a student model in acquiring knowledge directly from the environment through reinforcement learning. To assess the performance of our method, we applied it to a logistics loading scenario, addressing an online packing problem in simulated world environments. Experimental results revealed that our method outperforms both the heuristic and conventional RL models, highlighting the effectiveness of the heuristic knowledge distillation architecture in enhancing RL agent performance across diverse settings.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chengbo Yang, Yonggui Lü, Jing Du, Ting Liu, and Ronggui Dao "Heuristic deep reinforcement learning for online packing", Proc. SPIE 12941, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023), 1294158 (7 December 2023); https://doi.org/10.1117/12.3011479
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Online learning

Machine learning

Computer simulations

Deep learning

Evolutionary algorithms

Performance modeling

Back to Top