| Air quality prediction is one of the important directions in the field of time series prediction.In recent years,machine learning methods have been widely used in air quality prediction models.The air quality index is difficult to make accurate prediction due to the poor short-term regularity of the data and the comprehensive influence of many factors.Therefore,there are two problems with air quality prediction.First,Long Short-Term Memory(LSTM)is gradually being used in air quality prediction models.Vanilla LSTM(VLSTM)is the most widely used structure,but due to the short-term irregularity of air quality data,the original VLSTM does not have a good adaptability to the air quality data set.In addition,VLSTM also has the problem of a large number of parameters that reduce its own convergence ability.Second,many prediction models consider various factors related to air quality to be included in the predicting model for training.However,although air quality is susceptible to many factors,the degree of influence of each factor is different.The effectiveness of the data is not quantitatively measured and screened in the existing models.In this way,it will lead to harmful data being added to the network training process,which may cause network degradation.In response to the above problems,this paper has carried out the following work in the optimization of VLSTM internal structure and data validity selection:(1)Aiming at the problem that VLSTM has a large amount of parameters and its poor adaptability to air quality data,this paper designs an improved method by designing the internal structure of VLSTM.First of all,in order to solve the problem of the large number of VLSTM parameters,this paper combines the input gate of the VLSTM with the forget gate according to the existing improvement method of the non-peephole connection LSTM to reduce the number of weight parameters of the overall structure.Aiming at the irregularity of air quality data,this paper designs the peephole connection of VLSTM,and deletes the peephole connection from the updated unit state to the output gate.Then optimize the connection status of the peephole again,and introduce the unit status before the update to the output gate.When the input gate of VLSTM and the forget gate are merged,the input gate and the forget gate share a set of parameters,which reduces the amount of parameters between the two gates by half.Due to the reduction in the amount of parameters,the convergence ability during network training is improved,and the network predicts performance has been improved to some extent.The two optimization methods for peephole connection enable the improved structure of the deep learning network to improve its anti-interference ability during the training process,and it can perform better for the real-world air quality data set.The three structures finally formed,structure A,structure B and structure C,have been validated through ablation experiments.Under the same number of iterations,the prediction effect of the Improved VLSTM(IVLSTM)model proposed in this paper is improved in various indicators compared with VLSTM and GRU.(2)For practical problems,the validity of air quality training data is usually also an important factor that affects the final model prediction effect.In order to increase the validity of air quality training data,this paper proposes a data input and output method for the validity of air quality data selection,and quantifies the validity of air quality data.This method is composed of two sub-models: Multiple Channels Data Input(MCDI)model and Multiple Routes Result Output(MRRO)model.The MCDI model selects the air-quality-related data of all monitoring stations from different levels.Through this paper,a Linear-Similar-Attentionbased Dynamic Time Warping(LSA-DTW)algorithm is proposed to quantify time and space correlation.At the same time,in view of the multiple sets of outputs generated by the multiple sets of inputs of the MCDI model,this paper proposes the MRRO model.According to the characteristics of the target station and the predicted time interval,the appropriate channel combination is selected as the result integrated path and output respectively.The experimental results of individual training or compound training on the data of each layer in the MCDI model show that the LSA-DTW algorithm can achieve effective selection of spatio-temporal stations,and also verifies the necessity of the MCDI model.The comparative experiment results show the effectiveness of the MRRO model in improving the accuracy of the overall air quality prediction model. |