|
1.INTRODUCTIONWith the increasing complexity of actual traffic problems, the theories and methods of traffic prediction are still constantly renewal and development. Traffic forecasting is a very impotant issue in traffic control, it refers to the analysis of a large amount of historical data to predict the future traffic conditions as much as possible to help with traffic decisions to better control traffic and reduce traffic congestion. Long-term traffic flow forecasting is a very challenging task, which is determined by its high complexity, nonlinear time correlation, dynamic spatial correlation, and long-term accumulation of errors. With the development of science and technology, we can now obtain a large number of traffic time series data from the information collection equipment on expressways, which provides a good foundation for traffic big data forecasting. In the field of time series forecasting, traditional time-sequence analysis such as Autoregressive Integrated Moving Average (ARIMA)1 is still very popular, but it is difficult to deal with unstable and non-linear data. Recently years, the rapid growth of deep learning model has brought more possibilities, a lot of researchers have begun to use convolutional neural networks (CNN) in the feature extraction, but losing sight of spatiotemporal correlation. Defferrard et al.2 looks for potential data to find relations by Graph Convolutional Networks (GCN), but only for undirected graphs. Li et al.3 skilfully applies diffusion convolution to extract spatial features well, but the extraction of temporal features is not perfect. To attack the above problems, we propose a Spatio-Temporal Multi-Attention Graph Network (STMAGN), which has an appropriate architecture and gets good results. We extract the features of historical traffic data through the encoder, and the decoder uses the output sequences of previous structure. In order to reduce the impact of error propagation, we add a conversion layer before decoding. In this work, we use two mechanisms of attention to model the connection between time and space and gating them together to fuse information features. The multi-head attention is to discover the inherent correlation relationship of the time series from different angles. The model effectively captures the dynamic features and improves the prediction accuracy. 2.PRELIMINARYWe define the road network structure as a directed graph 𝒢 = (𝒱, ε, W). Here, 𝒱 represents a collection of all nodes |𝒱| = N, indicating the connectivity among nodes. W is the adjacency matrix representing the relationship among roads. The target of traffic forecasting is to use a large amount of historical data to predict various traffic parameters in the future, which is a standing dish question. Assuming we now have the information collected by the sensors on the road, we use Xt ∈ RN×C to represents the observed traffic flow information, where C represents various status information of the road. Given the observations of historical P time steps 3.SPATIO-TEMPORAL MULTI-ATTENTION GRAPH NETWORKFigure 1 presents the whole structure of the STMAGN model mentioned in this article. It is composed of an encoder-decoder structure and conversion layer. The encoder includes a spatio-temporal attention module with residual connection4 and an information fusion structure. The decoder includes a mask multi-head attention mechanism and gating structure. The conversion layer between them is responsible for converting the features extracted by the encoder into the decoder. Suppose we want to predict the data for a specific period of time, we will extract the time data of week, day and hour at the same time for modelling respectively to fully capture the periodicity of traffic flow. These inputs will be encoded by the encoder and transmitted to the conversion layer, and finally decoded by the decoder to obtain the output. 3.1Spatial-temporal embeddingIn practice, the evolution of traffic state will be affected by the basic traffic network structure, so it is necessary to build the network structure and input it into the prediction model. As shown in Figure 2, we model spatial dependence by associating traffic flow with diffusion process, which clearly captures the randomness of road network. The characteristic of the diffusion process is a random walk on graph, and the restart probability is α, with a probability matrix of information transfer where k is the diffusion steps. In this work, we take the truncation of finite k steps and model the spatial dependence by bidirectional diffusion. Hence, we can express spatial embedding in the following form: The spatial embedding only provides static representation and cannot represent the dynamic correlation. Therefore, we put forward another way, which encodes time dimension as a vector. We divide a day into N parts, encoded the time steps into RN and R7 by one-hot coding and spliced into RN+7, represented as In our model, we both unify these features into RD through a fully connected neural module and fuse them as spatiotemporal embedding (STE). Therefore, the STE includes both road network structure and time features. 3.2Multi-head attentionSince the attention mechanism was proposed, it has been applied extensively in many fields. It can find out the relationship between them according to the raw data and extract the most important features. Multi-head Attention is to calculate the attention of the data in different subspace with the total number of parameters unchanged, and the last step is to merge the attention information in different subspace7. The dimension of each vector is reduced by this way when calculating the attention of each head and the over-fitting phenomenon is also avoided; because attention has different parameters in different subspace, Multi-head Attention looks for the correlation between sequences relations from different angles in fact. For the next state of node i at time t, we update it with the sum of the corresponding weights of all nodes can be expressed as follow: α is the attention score indicating the significance of node,Hl – 1 indicates the last hidden state and Hl indicates the current state. Adopting the scaled dot-product approach7 to learn attention score. where || indicates the splicing process, f3 is another activation function. Therefore, we successfully capture the inner spatial relationship among nodes. 3.3Gate fusionIn order to further integrate the spatio-temporal relationship, we designed a gated fusion module to adaptively fuses spatiotemporal information as shown in Figure 3. After the input passes through the spatial and temporal attention mechanism, the output is represented as where W1 and b1are different learnable parameters, σ is the sigmoid function, 3.4GatingIn order to make the final results obtain the characteristics of long-time period, we introduce a gating mechanism to correct the attention results as shown in Figure 4, so as to reduce the prediction error. In this way, the network model can not only remember the information of the past, but also selectively forget some inessential information and shape long-term relationships, the calculation process is as follows: where split means separating the output results, tanh is another activation function, the gating mechanism can combine the obtained output with historical information to get more accurately results. 4.EXPERIMENTS4.1DatasetsWe used our model to make traffic predictions on real data set PeMS-bay. In this data set, we take the traffic speed every five minutes and normalize the data to the interval [0, 1]. In order to build the road network, we calculate the paired road distance and use the threshold to construct adjacency matrix9 4.2Result analysisIn order to make the model results easier to understand, we visualize the experiments results. Figure 5 shows experimental results on actual traffic data set of the model. From these figures, we conclude that when the average road speed fluctuates little, the model can generate a relatively smooth curve to fit the road traffic conditions. Even if the road conditions change suddenly, the model can capture this change according to the spatio-temporal characteristics and generate more accurate prediction curve. 4.3Baselines and comparisonWe compare our model with other baselines. Table 1 shows the average results of traffic forecasting estimated performance in the next one hour. It can be seen that our STMAGN gets a good form in all aspects of all evaluation indicators, especially in the long-term prediction stage, it has achieved far better results than other models. In addition, we can observe that the results of traditional sequences approaches are not perfect in general, which shows that these methods have limited capacity for increasingly complex transportation systems. Through comparison, the methods on account of deep learning have generally achieved good results. Table 1.Comparison of results between STMAGN and the others.
4.4Effectiveness of each moduleIn order to study the impact of each module, we evaluate in three different ways, like removing conversion layer or STE or gating from the model. Figure 6 shows the impact after removing these components, thus proving the effectiveness of each module. The experimental results show that removing any module has an established impact on the final prediction accuracy, especially the accuracy of modules without STE is dropping fast. 5.CONCLUSIONIn our work, we put forward a spatio-temporal multi-attention graph network (STMAGN) to predict traffic situation. Specifically, we use a spatial and temporal attention mechanism to simulate intricate traffic situations, and propose bidirectional diffusion convolution and one-hot encoding to capture the dynamic spatio-temporal features more effectively and integrate them together. In addition to the above methods, we also apply the conversion layer to avoid error accumulation and the gating mechanism to obtain more accurate output. Experiments on real data set show that the prediction accuracy of model has good prediction accuracy. ACKNOWLEDGMENTSThis work is supported by the National Natural Science Foundation of China (Grant No.61773243), Major Technology Innovation Projects of Shandong Province (Grant No. 2019TSLH0203) and the National Key Research and Development Program of China (Grant No. 2020YFB1600501). REFERENCESWilliams, B. M. and Hoel, L. A.,
“Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results,”
Journal of Transportation Engineering, 129
(6), 664
–672
(2003). https://doi.org/10.1061/(ASCE)0733-947X(2003)129:6(664) Google Scholar
Defferrard, M., Bresson, X. and Vandergheynst, P.,
“Convolutional neural networks on graphs with fast localized spectral filtering,”
NIPS, 3837
–3845
(2016). Google Scholar
Li, Y., Yu, R., Shahabi, C. and Liu, Y.,
“Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,”
in Proc. of ICLR,
(2017). Google Scholar
He, K., Zhang, X., Ren, S. and Sun, J.,
“Deep residual learning for image recognition,”
in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
770
–778
(2016). Google Scholar
Teng, S. H.,
“Scalable algorithms for data and network analysis,”
Foundations and Trends® in Theoretical Computer Science, 12
(1-2), 1
–274
(2016). https://doi.org/10.1561/0400000051 Google Scholar
Zheng, C., Fan, X., Wang, C. and Qi, J.,
“GMAN: A graph multi-attention network for traffic prediction,”
in Proc. of the AAAI Conf. on Artificial Intelligence,
1234
–1241
(2020). Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al,
“Attention is all you need,”
NeurIPS, 5998
–6008
(2017). Google Scholar
Wang, X., Ma, Y., Wang, Y, Jin, W. and Yu, J.,
“Traffic flow prediction via spatial temporal graph neural network,”
in Web Conf,
1082
–1092
(2020). Google Scholar
Shuman, D. I., Narang, S. K., Frossard, P., et al,
“The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”
IEEE Signal Processing Magazine, 30
(3), 83
–98
(2013). https://doi.org/10.1109/MSP.2012.2235192 Google Scholar
|