Open Access Paper
24 May 2022 Spatio-temporal multi-attention graph network for traffic forecasting
Qinzheng Li, Wenxing Zhu
Author Affiliations +
Proceedings Volume 12260, International Conference on Computer Application and Information Security (ICCAIS 2021); 122600F (2022) https://doi.org/10.1117/12.2637523
Event: International Conference on Computer Application and Information Security (ICCAIS 2021), 2021, Wuhan, China
Abstract
Traffic forecasting is one of the most important problems in the areas of intelligent transportation system, and it is the key link. It plays a major role in transportation service and navigation. However, urban traffic has its own characteristics, and the complex traffic system is highly nonlinear and stochastic, which makes traffic forecasting a very difficult problem. Although many previous methods can make the high performance for predicting in traffic forecasting, the existing research has not fully utilized the influence of spatial and temporal characteristics on prediction. In this article, we put forward a new model called Spatio-temporal multi-attention graph network. Taking into account the similar features of traffic flow every day and the interaction between road network structures, the model takes advantages of the internal dependence between the dynamic spatial network and the time dimension information to improve accuracy of forecasting. Experimental results show that our model is nicer over the others, which has good performance and gain more precision prediction accuracy.

1.

INTRODUCTION

With the increasing complexity of actual traffic problems, the theories and methods of traffic prediction are still constantly renewal and development. Traffic forecasting is a very impotant issue in traffic control, it refers to the analysis of a large amount of historical data to predict the future traffic conditions as much as possible to help with traffic decisions to better control traffic and reduce traffic congestion.

Long-term traffic flow forecasting is a very challenging task, which is determined by its high complexity, nonlinear time correlation, dynamic spatial correlation, and long-term accumulation of errors. With the development of science and technology, we can now obtain a large number of traffic time series data from the information collection equipment on expressways, which provides a good foundation for traffic big data forecasting. In the field of time series forecasting, traditional time-sequence analysis such as Autoregressive Integrated Moving Average (ARIMA)1 is still very popular, but it is difficult to deal with unstable and non-linear data. Recently years, the rapid growth of deep learning model has brought more possibilities, a lot of researchers have begun to use convolutional neural networks (CNN) in the feature extraction, but losing sight of spatiotemporal correlation. Defferrard et al.2 looks for potential data to find relations by Graph Convolutional Networks (GCN), but only for undirected graphs. Li et al.3 skilfully applies diffusion convolution to extract spatial features well, but the extraction of temporal features is not perfect.

To attack the above problems, we propose a Spatio-Temporal Multi-Attention Graph Network (STMAGN), which has an appropriate architecture and gets good results. We extract the features of historical traffic data through the encoder, and the decoder uses the output sequences of previous structure. In order to reduce the impact of error propagation, we add a conversion layer before decoding. In this work, we use two mechanisms of attention to model the connection between time and space and gating them together to fuse information features. The multi-head attention is to discover the inherent correlation relationship of the time series from different angles. The model effectively captures the dynamic features and improves the prediction accuracy.

2.

PRELIMINARY

We define the road network structure as a directed graph 𝒢 = (𝒱, ε, W). Here, 𝒱 represents a collection of all nodes |𝒱| = N, indicating the connectivity among nodes. W is the adjacency matrix representing the relationship among roads. The target of traffic forecasting is to use a large amount of historical data to predict various traffic parameters in the future, which is a standing dish question. Assuming we now have the information collected by the sensors on the road, we use XtRN×C to represents the observed traffic flow information, where C represents various status information of the road.

Given the observations of historical P time steps 00183_psisdg12260_122600f_page_2_1.jpg at N vertices, our goal is to learn a sophisticated function F(∙) to connect the future Q time steps with the historical P time steps:

00183_psisdg12260_122600f_page_2_2.jpg

3.

SPATIO-TEMPORAL MULTI-ATTENTION GRAPH NETWORK

Figure 1 presents the whole structure of the STMAGN model mentioned in this article.

Figure 1.

Spatio-temporal multi-attention graph network.

00183_psisdg12260_122600f_page_2_3.jpg

It is composed of an encoder-decoder structure and conversion layer. The encoder includes a spatio-temporal attention module with residual connection4 and an information fusion structure. The decoder includes a mask multi-head attention mechanism and gating structure. The conversion layer between them is responsible for converting the features extracted by the encoder into the decoder.

Suppose we want to predict the data for a specific period of time, we will extract the time data of week, day and hour at the same time for modelling respectively to fully capture the periodicity of traffic flow. These inputs will be encoded by the encoder and transmitted to the conversion layer, and finally decoded by the decoder to obtain the output.

3.1

Spatial-temporal embedding

In practice, the evolution of traffic state will be affected by the basic traffic network structure, so it is necessary to build the network structure and input it into the prediction model. As shown in Figure 2, we model spatial dependence by associating traffic flow with diffusion process, which clearly captures the randomness of road network.

Figure 2.

Spatio-temporal embedding.

00183_psisdg12260_122600f_page_3_1.jpg

The characteristic of the diffusion process is a random walk on graph, and the restart probability is α, with a probability matrix of information transfer 00183_psisdg12260_122600f_page_3_2.jpg. D𝑂 = diag(W). The matrix will restrain itself to an equilibrium probability condition PRN×N. Teng et al.5 pointed out that the ultimate stable state can be obtained by the following:

00183_psisdg12260_122600f_page_3_3.jpg

where k is the diffusion steps. In this work, we take the truncation of finite k steps and model the spatial dependence by bidirectional diffusion. Hence, we can express spatial embedding in the following form:

00183_psisdg12260_122600f_page_3_4.jpg

The spatial embedding only provides static representation and cannot represent the dynamic correlation. Therefore, we put forward another way, which encodes time dimension as a vector. We divide a day into N parts, encoded the time steps into RN and R7 by one-hot coding and spliced into RN+7, represented as 00183_psisdg12260_122600f_page_3_5.jpg.

In our model, we both unify these features into RD through a fully connected neural module and fuse them as spatiotemporal embedding (STE). Therefore, the STE includes both road network structure and time features.

3.2

Multi-head attention

Since the attention mechanism was proposed, it has been applied extensively in many fields. It can find out the relationship between them according to the raw data and extract the most important features. Multi-head Attention is to calculate the attention of the data in different subspace with the total number of parameters unchanged, and the last step is to merge the attention information in different subspace7. The dimension of each vector is reduced by this way when calculating the attention of each head and the over-fitting phenomenon is also avoided; because attention has different parameters in different subspace, Multi-head Attention looks for the correlation between sequences relations from different angles in fact.

For the next state of node i at time t, we update it with the sum of the corresponding weights of all nodes can be expressed as follow:

00183_psisdg12260_122600f_page_3_6.jpg

α is the attention score indicating the significance of node,Hl – 1 indicates the last hidden state and Hl indicates the current state. Adopting the scaled dot-product approach7 to learn attention score.

00183_psisdg12260_122600f_page_3_7.jpg

where || indicates the splicing process, 00183_psisdg12260_122600f_page_4_1.jpg represents spatio-temporal embedding vector, f1 and f2 are an activation function with two different parameters, d is the dimension after vector splicing. Then, λk is normalized to α by SoftMax8, by splicing K attention mechanisms with different learning styles, we can get:

00183_psisdg12260_122600f_page_4_2.jpg

f3 is another activation function. Therefore, we successfully capture the inner spatial relationship among nodes.

3.3

Gate fusion

In order to further integrate the spatio-temporal relationship, we designed a gated fusion module to adaptively fuses spatiotemporal information as shown in Figure 3.

Figure 3.

Fuse spatio attention and temporal attention information together.

00183_psisdg12260_122600f_page_4_10.jpg

After the input passes through the spatial and temporal attention mechanism, the output is represented as 00183_psisdg12260_122600f_page_4_3.jpg and 00183_psisdg12260_122600f_page_4_4.jpg, 00183_psisdg12260_122600f_page_4_5.jpg and 00183_psisdg12260_122600f_page_4_6.jpg merge into:

00183_psisdg12260_122600f_page_4_7.jpg

where W1 and b1are different learnable parameters, σ is the sigmoid function, 00183_psisdg12260_122600f_page_4_8.jpg and 00183_psisdg12260_122600f_page_4_9.jpg is the result of spatio and temporal attention.

3.4

Gating

In order to make the final results obtain the characteristics of long-time period, we introduce a gating mechanism to correct the attention results as shown in Figure 4, so as to reduce the prediction error.

Figure 4.

Control information transmission.

00183_psisdg12260_122600f_page_4_11.jpg

In this way, the network model can not only remember the information of the past, but also selectively forget some inessential information and shape long-term relationships, the calculation process is as follows:

00183_psisdg12260_122600f_page_4_12.jpg

where split means separating the output results, tanh is another activation function, the gating mechanism can combine the obtained output with historical information to get more accurately results.

4.

EXPERIMENTS

4.1

Datasets

We used our model to make traffic predictions on real data set PeMS-bay. In this data set, we take the traffic speed every five minutes and normalize the data to the interval [0, 1]. In order to build the road network, we calculate the paired road distance and use the threshold to construct adjacency matrix9 00183_psisdg12260_122600f_page_4_13.jpg if dist(vi, vj) ≤ δ, otherwise 0, where Wij represents the weight of the adjacency matrix. dist(vi, vj)indicates the distance between sensors. σ is the standard deviation indicates the degree of dispersion of the distance, and δ is the limit value remove unnecessary data.

4.2

Result analysis

In order to make the model results easier to understand, we visualize the experiments results. Figure 5 shows experimental results on actual traffic data set of the model. From these figures, we conclude that when the average road speed fluctuates little, the model can generate a relatively smooth curve to fit the road traffic conditions. Even if the road conditions change suddenly, the model can capture this change according to the spatio-temporal characteristics and generate more accurate prediction curve.

Figure 5.

Result on real data of PeMS-bay.

00183_psisdg12260_122600f_page_5_1.jpg

4.3

Baselines and comparison

We compare our model with other baselines. Table 1 shows the average results of traffic forecasting estimated performance in the next one hour. It can be seen that our STMAGN gets a good form in all aspects of all evaluation indicators, especially in the long-term prediction stage, it has achieved far better results than other models. In addition, we can observe that the results of traditional sequences approaches are not perfect in general, which shows that these methods have limited capacity for increasingly complex transportation systems. Through comparison, the methods on account of deep learning have generally achieved good results.

Table 1.

Comparison of results between STMAGN and the others.

 Horizon 3Horizon 6Horizon 12
 MAERMSEMAPEMAERMSEMAPEMAERMSEMAPE
PEMS-BAY 
ARIMA1.613.303.50%2.324.785.42%3.406.518.32%
SVR1.873.603.81%2.505.175.52%3.297.098.02%
VAR1.723.163.60%2.304.265.02%2.945.466.50%
FNN2.224.405.20%2.284.655.41%2.464.975.91%
FC-LSTM2.074.214.80%2.214.555.22%2.354.955.72%
STGCN1.372.962.89%1.814.254.17%2.495.705.81%
DCRNN1.372.952.91%1.753.993.91%2.074.724.92%
ASTGCN1.523.133.22%2.014.274.48%2.615.426.00%
STMAGN1.483.023.42%1.733.734.14%1.934.194.79%

4.4

Effectiveness of each module

In order to study the impact of each module, we evaluate in three different ways, like removing conversion layer or STE or gating from the model. Figure 6 shows the impact after removing these components, thus proving the effectiveness of each module.

Figure 6.

Effectiveness of each part of the model.

00183_psisdg12260_122600f_page_6_1.jpg

The experimental results show that removing any module has an established impact on the final prediction accuracy, especially the accuracy of modules without STE is dropping fast.

5.

CONCLUSION

In our work, we put forward a spatio-temporal multi-attention graph network (STMAGN) to predict traffic situation. Specifically, we use a spatial and temporal attention mechanism to simulate intricate traffic situations, and propose bidirectional diffusion convolution and one-hot encoding to capture the dynamic spatio-temporal features more effectively and integrate them together. In addition to the above methods, we also apply the conversion layer to avoid error accumulation and the gating mechanism to obtain more accurate output. Experiments on real data set show that the prediction accuracy of model has good prediction accuracy.

ACKNOWLEDGMENTS

This work is supported by the National Natural Science Foundation of China (Grant No.61773243), Major Technology Innovation Projects of Shandong Province (Grant No. 2019TSLH0203) and the National Key Research and Development Program of China (Grant No. 2020YFB1600501).

REFERENCES

[1] 

Williams, B. M. and Hoel, L. A., “Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results,” Journal of Transportation Engineering, 129 (6), 664 –672 (2003). https://doi.org/10.1061/(ASCE)0733-947X(2003)129:6(664) Google Scholar

[2] 

Defferrard, M., Bresson, X. and Vandergheynst, P., “Convolutional neural networks on graphs with fast localized spectral filtering,” NIPS, 3837 –3845 (2016). Google Scholar

[3] 

Li, Y., Yu, R., Shahabi, C. and Liu, Y., “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” in Proc. of ICLR, (2017). Google Scholar

[4] 

He, K., Zhang, X., Ren, S. and Sun, J., “Deep residual learning for image recognition,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 770 –778 (2016). Google Scholar

[5] 

Teng, S. H., “Scalable algorithms for data and network analysis,” Foundations and Trends® in Theoretical Computer Science, 12 (1-2), 1 –274 (2016). https://doi.org/10.1561/0400000051 Google Scholar

[6] 

Zheng, C., Fan, X., Wang, C. and Qi, J., “GMAN: A graph multi-attention network for traffic prediction,” in Proc. of the AAAI Conf. on Artificial Intelligence, 1234 –1241 (2020). Google Scholar

[7] 

Vaswani, A., Shazeer, N., Parmar, N., et al, “Attention is all you need,” NeurIPS, 5998 –6008 (2017). Google Scholar

[8] 

Wang, X., Ma, Y., Wang, Y, Jin, W. and Yu, J., “Traffic flow prediction via spatial temporal graph neural network,” in Web Conf, 1082 –1092 (2020). Google Scholar

[9] 

Shuman, D. I., Narang, S. K., Frossard, P., et al, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, 30 (3), 83 –98 (2013). https://doi.org/10.1109/MSP.2012.2235192 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Qinzheng Li and Wenxing Zhu "Spatio-temporal multi-attention graph network for traffic forecasting", Proc. SPIE 12260, International Conference on Computer Application and Information Security (ICCAIS 2021), 122600F (24 May 2022); https://doi.org/10.1117/12.2637523
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Feature extraction

Convolution

Intelligence systems

Back to Top