Font Size: a A A

Applications Of K-means Clustering And Random Forest For Charge Predictions Of Multi-spin-state Force Fields Of The Heme-model

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2370330572484758Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The binding and dissociation of heme and oxygen are important life phenomenons.In the process of oxygen approaching,the charge polarization effect of heme is obvious,and the substrate has a significant effect on spin states of heme so that the system occurs a phenomenon of spin crossover.Traditional force fields are in the fixed-charge model and is not suitable for heme.At present,there is less work to develop force fields of the spin crossover of heme.In view of this problem,our research group put forward an assumption of variable-charge force fields of multi-spin states.The core of this assumption was to accurately predict charges of heme in each spin state according to conformational changes in order to correctly describe energy changes of each spin state caused by the conformational changes.In recent years,methods based on K-means clustering and random forest in charge predictions have gradually been important and have been successfully applied.The core idea of this paper was to develop the variable-charge force field of multi-spin states using these methods and test performances of these methods in the charge force field.The specific research ideas of this paper were as follows:First,the heme-model and oxygen complex were simulated by the non-adiabatic dynamics.Nearly 40,000 heme-model structures were extracted from trajectories,and ESP charges of the system in each spin state were calculated by density functional method.Then,we used K-means clustering and random forest methods to test predictions of charges.For each test method,we built two data sets.One was that the system was in a high energy state without the geometric optimization.The other was that structures were optimized,and the system was at the local energy minimum point.We introduced distance matrixes,symmetric functions and structure parameters of artificial selections as descriptors and made a systematic comparison in order to optimize the predicted methods.Based on the K-means clustering method,we added distance matrixes,compared two predicted methods based on structural parameters and distance matrixes,and accurately found 11 structural parameters describing the heme-model system so that the two predicted methods could present same good effect.In the method of random forest,we not only combined symmetric functions describing atomic chemical environments with random forest regression to predict charges,but also adopted 11 structure parameters as descriptors to predict charges.The main conclusions of this paper were as follows:1.The prediction based on K-means clustering method did not need to build a complex analysis model to get different structure charges under different spin states.In addition,the K-means clustering method adopted an idea of the ensemble average,which has higher fault-tolerance rates than a simplex model.2.The random forest regression model not only had a high computational efficiency,but also realized predictions of a simplex atom,updated charges of important atoms in real time,and provided the variable-charge force field parameters of required atoms in multi-spin states.3.Through analyses of mean absolute error,root mean square error,correlation coefficient and so on,it showed that both methods could obtain accurate prediction results,and both methods were effective methods to obtain parameters of the variable-charge force field of multi-spin states.In conclusion,we used K-means clustering and random forest to predict charges of the heme-model in each spin state and multi-spin-state charges of the system after the geometric optimization respectively,then compared and analyzed predicted effect of the two methods.Of course,our work was only a preliminary attempt for study of force fields of heme,thus has many shortcomings.In the future work,we will further improve calculation methods of parameters of the variable-charge force field of multi-spin states.We believe that this paper can also provide references for developments of other force fields based on machine learning.
Keywords/Search Tags:spin crossover, density functional method, geometry optimization, distance matrixes, multi-spin-state charges, machine learning
PDF Full Text Request
Related items