In recent years, Graph Convolutional Networks (GCNs) have gained attention in the field of action recognition. However, existing methods can only extract simple spatiotemporal features of individual joints and fail to capture comprehensive spatiotemporal information of the entire human body, with limitations in modeling short-term spatiotemporal information. To address these issues, this paper proposes a Graph Convolutional Network method with short-term spatiotemporal information fusion and attention. This method learns temporal features through a short-term spatiotemporal feature fusion module, enhances the temporal representation of action features by combining human spatiotemporal information, and improves spatial skeleton information through keypoint attention modeling. Finally, multi-scale temporal convolution is used for long-term information exchange, and fusion of four-stream scores is employed for classification prediction. Experimental results demonstrate that this method outperforms existing approaches on the NTU RGB+D and NTU RGB+D120 datasets.
|