Learning to walk with biped robot based on an improved proximal policy optimization algorithm

Chao Zhang; Peisi Zhong; Zhongyuan Liang; Mei Liu; Xiao Wang; Jinming Liu

doi:10.1117/12.2625340

12 December 2021 Learning to walk with biped robot based on an improved proximal policy optimization algorithm

Chao Zhang, Peisi Zhong, Zhongyuan Liang, Mei Liu, Xiao Wang, Jinming Liu

Proceedings Volume 12127, International Conference on Intelligent Equipment and Special Robots (ICIESR 2021); 121271B (2021) https://doi.org/10.1117/12.2625340
Event: International Conference on Intelligent Equipment and Special Robots (ICIESR 2021), 2021, Qingdao, China

Abstract

In this paper, a queue batch sampling algorithm is proposed to address the problem of deep reinforcement learning in continuous action control tasks that affects the efficiency of the algorithm due to high sampling correlation. The proposed sampling algorithm reduces the sampling correlation while using the forward value of sampling as a measure to ensure the value of sampling. First, the starting sample is randomly selected as the sampling starting point in the memory bank generated by the robot's interaction with the environment. Second, to save computational resources, the samples in the repository are sampled for the first time, the correlation between the sampling starting point and other samples and the forward multi-step reward of all samples except the sampling starting point is calculated, and the score of each sample based on the ranking position is obtained based on the weighted cumulative results. Finally, the samples with the previous minimum sampling batch size are selected based on the scores to train the deep reinforcement learning model. The proposed queueing batch sampling algorithm is fused with the deep proximal policy optimization algorithm and applied to the forward motion task of a bipedal robot. The proposed improved deep proximal optimization algorithm is compared with the original deep proximal optimization algorithm of the actor-critic framework in simulation, and the learning efficiency of the deep neural network is improved due to the queueing than sampling algorithm which reduces the correlation of sampling and ensures the quality of sampling, and the learning effect of the deep proximal policy optimization algorithm based on queueing batch sampling is significantly improved, that is, the proposed queueing batch sampling algorithm can effectively improve the learning efficiency of bipedal robot learning forward.

Citation Download Citation

Chao Zhang, Peisi Zhong, Zhongyuan Liang, Mei Liu, Xiao Wang, and Jinming Liu "Learning to walk with biped robot based on an improved proximal policy optimization algorithm", Proc. SPIE 12127, International Conference on Intelligent Equipment and Special Robots (ICIESR 2021), 121271B (12 December 2021); https://doi.org/10.1117/12.2625340

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available