Font Size: a A A

Study On Computational Accuracy Of Time-Dependent Density Functional Theory Of Molecular Excitation Energy Based On Ensemble Machine Learning Model

Posted on:2020-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X CuiFull Text:PDF
GTID:1361330620452294Subject:Physical chemistry
Abstract/Summary:PDF Full Text Request
The study of molecular excitation energy is one of the hotspots and difficulties in theoretical computational chemistry.Since excitation energy contains the intrinsic structural information and electronic properties of molecules,accurate prediction of molecular excited state properties including electron transition absorption energy and emission wavelength has become a key issue in theoretical computational chemistry.After years of researches and applications,quantum chemical methods have exceeded the level of theoretical quantitative verification of experimental phenomena,and developed into accurate prediction of the ground state,excited state properties and chemical reaction phenomena of substances under the condition that experimental values of some molecular properties cannot be obtained or are not accurate.However,not all the calculated results can be in accurate agreement with the actual experimental values,especially those related to the excited states of larger molecules.This is due to the high computational complexity of the excited state properties of complex molecules or macromolecular systems,especially when a certain degree of accuracy is required.The limitation of computational resources and the inherent approximation of computational methods are the main reasons for this phenomenon.In order to solve these problems,artificial intelligence methods have provided some simple and effective strategies to correct the errors of theoretical calculation,so as to improve the accuracy of theoretical calculation method and expand its application range.In this paper,machine learning ensemble algorithms and quantum chemical calculation methods are combined to improve the computational efficiency and accuracy of molecular excited states.First,models were built based on AdaBoost and Bagging,two typical machine learning ensemble architectures,and applied to the data set containing 433 organic molecules to improve the accuracy of DFT in calculating the absorbed energy of electron spectrum.The two ensemble models were then applied to the data set of 113 fluorescent molecules in order to improve the regression accuracy of emission wavelength.The method used in the above research work provides an effective and efficient alternative method for accurate prediction of molecular properties,improves the reliability of the theoretical method and expands its applicable scope.The research work of this paper can be summarized as the following parts:1.A strategy which combined Time-Dependent Density Functional Theory?TDDFT?quantum chemistry method with machine learning algorithm was adopted to propose an accurate,robust and efficient ensemble correction model for absorption energy calculation.The model is built by AdaBoost framework which integrates support vector machine?SVM?,generalized regression neural network?GRNN?and random forest?RF?as the regression methods.Through the correction of the ensemble model,the accuracy of TDDFT?TDB3LYP/STO-3G,6-31G*,6-311G**?calculating results can be significantly improved.Where,the average absolute error?MAE?and root mean square error?RMSE?of the minimum STO-3G base group were reduced from 0.62 and 0.79 eV to 0.11 and 0.14 eV,respectively.The validation parameters of the correction model can reach up to R2?0.97?,Q2?0.98?,andQc2 v?0.99?,which indicated its good fitting and prediction performance.The research shows that the proposed ensemble correction model only needs TDDFT calculation based on the minimum base group to achieve a higher precision of large base group,and the calculation time of the model is minimal compared with that of TDDFT.2.Explore regression model based on linear fitting cosine angle distance ensemble rule,which is built on Bagging ensemble framework and integrates multiple base machine regression learning methods including GBDT,GRNN,ELM,RF and SVM.Under the framework of Bagging,the ensemble correction model has the advantages of processing high-dimensional data and strong generalization ability,which can significantly improve the excitation state calculation of TDDFT.In order to obtain high-precision results,MAE and RMSE of absorption energy(?max)regression results can be reduced from 0.62 to 0.09 eV and 0.79 to 0.12 eV respectively with the same minimum calculation resource?TD-B3LYP/STO-3G?.In addition,since the ensemble method proposed in this study is based on weighted average Bagging algorithm to integrate the results of multiple single base learner regression models,its time complexity is actually the same as that of the single base learner algorithm,meanwhile it is more concise than AdaBoost model in guaranteeing high accuracy and high efficiency.This indicates that Bagging ensemble can be used as one of the better correction model tools to reduce the high calculation cost.3.In view of the successful correction of the above ensemble model of AdaBoost and Bagging for the absorption energy's calculation results,it is attempted to further apply the two models to the data set of 162 samples containing 113near-infrared fluorescence molecules for precision correction of emission wavelength.The experimental results show that the ensemble model can reduce the MAE and RMSE values of emission wavelength calculated by TDDFT/STO-3G from 1.094 to0.014eV and from 1.375 to 0.017eV,respectively.The applicability and validity of the ensemble model are further proved.
Keywords/Search Tags:TDDFT, Excited Energy, Absorption Energy, Ensemble Learning, AdaBoost, Bagging
PDF Full Text Request
Related items