With the rapid economic and social growth in recent years,water environment problems closely related to human production and life have drawn an extensive attention from government and society.Remote sensing technology can realize the rapid monitoring of water quality information in a non-contact way,and modeling is an important part of water color remote sensing.Although traditional empirical models have certain advantage in simple structures and clear spectral characteristics,their fixed mapping relationship will limit application in complex inland case 2 water.In the meantime,machine learning models have certain advantages in dealing with non-linear issues as fixed band combination is not required.However,machine learning models require a large number of representative training samples due to their complex network structure.In addition,machine learning models focus more on data mining,rather than discriminating known spectral characteristics in advance.For chlorophyll a(Chl-a)and total suspended particulate(TSP),their spectral characteristics are clear,and whether machine learning models are better or not is not clear.In this paper,a scheme using the same bands as those in traditional empirical statistical models which have clear spectral significance are adopted to construct empirical statistical models and back propagation neural network(BPNN)models.The two types of retrieval models for monitoring the concentrations of Chl-a and TSP are evaluated in details.Moreover,the spectral information of multi-band combination is limited,and the combination of hyperspectral and machine learning methods is providing a new way of water color remote sensing monitoring.Therefore,other spectral characteristic bands containing more water color information and Random Forest(RF)model are introduced to monitor the concentrations of Chl-a.The main conclusions are as follows:(1)In multispectral models,both traditional empirical models and machine learning models work well to estimate the concentration of Chl-a and TSP.In addition,the machine learning models are not always better than the traditional models.Considering that the cost of machine learning models is higher than that of traditional empirical models,the empirical models were recommended to retrieve the concentration of Chl-a and TSP in multispectral modeling,as their clear spectral characteristics and mature empirical algorithm.(2)For the optimized band combination,both traditional empirical models and machine learning models can maintain relatively good consistency in different concentration ranges.For the two-band combination(R555 and R750)in TSP and the three-band combination(R670,R710 and R750)in Chl-a,the mean relative error(MRE)in high concentration area with sparse samples was close to that in low and middle concentration area with non-sparse samples,but the root mean square error(RMSE)in high concentration range was highest.(3)The verification accuracies of traditional empirical models and machine learning models will decrease with the range of training dataset narrowed.When the training range was changed,the machine learning models had higher RMSE error in the high concentration area outside the training threshold,while the traditional empirical models had higher MRE error in the low concentration area outside the training threshold.Based on this conclusion,a layered model was constructed,and the accuracy of Chl-a concentration was improved by 22%.(4)Considering the variable importance of all bands,the spectral characteristics of Chl-a and the band composition of Chl-a traditional empirical models,nine bands for Chl-a characteristics were determined,which could improve the accuracy of BPNN model,but not significantly,while significantly improve the accuracy of RF model.However,more bands did not mean higher accuracy,i.e.,the accuracy of machine learning models based on all bands(218 bands)was lower than those with nine bands. |