Font Size: a A A

Research Of Machine Learning Method In Indoor Pollution Source Identification

Posted on:2019-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:H G ZhangFull Text:PDF
GTID:2381330572955480Subject:Architecture
Abstract/Summary:PDF Full Text Request
The real-time identitication ot the location or the pollution source can effectively prevent the loss ot human life and property caused by the release of harmful chemicals in emergencies.At present,the method of pollution source identification will encounter many difficulties in practical applications.For example,the QR method needs to know in advance the geometric distribution of the pollution concentration as the initial condition,and the adjoint probability method needs to know the specific time of source release,which is difficult to achieve in practical applications.of.The machine learning method represented by the artificial neural network can accurately predict the location of the pollution source with only the pollutant concentration measured by the sensor as input.In a 7-room building,we assume that there is only one pollution source.Independently identically distributed sampling was performed on the characteristics of the pollution source(location of the pollution source,release quality,release duration),and then each sample was calculated using CONTAM.We have obtained a database with constant meteorological parameters and a database with meteorological parameters obeying a certain distribution.Using these two databases,we compared artificial neural networks,support vector machines,k-nearest neighbors,and naive Bayesian classifications with different parameters,different feature vector selections,number of different training samples,different meteorological parameters,and different layouts of sensors.The performance of these four classification learning algorithms has led to the following conclusions:1.In terms of feature selection,we found that the classification model trained with the sensor reading change value within 5 minutes as input was superior to the case where only the instantaneous value of concentration was input.The prediction performance of the four learning models has greatly improved.For ANN models and SVM models,the prediction accuracy can be increased to 100%.For the KNN model,the prediction accuracy of the model increases from about 70%to nearly 100%in different number and layout of sensors.When the number of sensors is more than 5,the prediction accuracy of the NB model also increases to about 90%.2.In terms of the effect of the number of training samples on the classification model,we found that for the ANN model,the accuracy of the 20-fold cross-validation was close to 100%when the number of training samples was 390.For the SVM model,as the number of training samples increases,the accuracy of 20-fold cross-validation also gradually increases,and the growth rate gradually becomes slower.For the KNN model,the prediction accuracy increases significantly with the increase of the number of samples.When the number of samples reaches 1950,the accuracy of 20-fold cross-validation reaches 91%.It can be seen that using the KNN model to predict the location of the pollution source requires more training samples to train the KNN model in order to achieve the same accuracy as ANN or SVM.Finally,for the NB model,increasing the number of training samples does not significantly improve the prediction accuracy as KNN does.3.When the meteorological parameters are constant,the ANN algorithm has the highest prediction accuracy and the accuracy is 100%,followed by the SVM,KNN,and NB algorithms.The ANN classification algorithm and SVM algorithm are almost unaffected by the number of sensors.Even with only two sensors in the room,the accuracy of the pollution source location can still reach 100%and 99%.When the meteorological parameters follow a certain distribution,the 20-fold cross validation accuracy of the classification model is more strongly affected by the number of sensors.As the number of sensors decreases,their prediction accuracy decreases nearly linearly.Regardless of the introduction of meteorological parameter variables,the performance of the four classification models is ANN>SVM>KNN>NB.4.The layout of the sensor itself has advantages and disadvantages.This has nothing to do with which classification model to use.In other word,a certain classification algorithm has a higher prediction accuracy under certain layout,and it also has a higher prediction accuracy for other classification algorithms.The closer the number of sensors is to the number of building areas,the lower the influence of various layouts on the classification model.The smaller the number of sensors,the more obvious the differences between various types of layout.Using a constant database of meteorological parameters,we studied the use of multiple linear regression models,artificial neural network models,and support vector regression models to predict the quality of pollution sources released.The following conclusions have been obtained:1.Using the change value of pollutant concentration within 5 minutes as input to predict the release source's mass accuracy is better than using the sensor's read instantaneous concentration value as input.2.We have found that when the actual release quality of the pollution source is small(below 40 mg),the prediction accuracy of the sensor reading change value within 5 minutes is higher than the actual release quality(above 60 mg).The former's predicted value is almost equal to the actual value,while the latter's predicted value is greatly different from the actual value.The reason for this may be that when the actual release quality is relatively small,the air exchange between the building and the outside will lead to even less transfer of pollutants to the outdoor atmosphere.When the quality of the release of pollution sources is large,more pollutants are transferred to the outside,which makes our prediction values tend to be lower than the actual value.3.The prediction accuracy of the release quality of indoor pollution sources in the SVM regression model is poor compared with multiple linear regression models and artificial neural network models.4.The pollution source release quality identification problem is different from the pollution source location identification situation(The more sensors,the higher the prediction accuracy).In fact,we found that for the ANN model,the mean square error of the five sensors is the lowest,and for the linear regression,the four-sensor minimum is the lowest.As the number of sensors decreases,the predicted mean square error will slowly decrease,and when the minimum value is reached,if the number of sensors is further reduced,the mean square error will rise sharply.This minimum is exactly what we care about.Because it means using the least sensor cost,it can get the best prediction accuracy.The whole thesis contains about 38500 words,63 pictures and charts.
Keywords/Search Tags:Source location identification, machine learning, multi-zone model, artificial neural network, support vector machine, k neighbor algorithm, naive bayes classification
PDF Full Text Request
Related items