Font Size: a A A

Study On Construction Of Multi-layer Sweet Prediction System Based On Machine Learning

Posted on:2022-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YangFull Text:PDF
GTID:2481306338988559Subject:Food Engineering
Abstract/Summary:PDF Full Text Request
Evolution has given people love for sweetness.Sweeteners,as sugar substitutes,have become an important sweet source in people's diets.Sweetener has been developed for many years,but most sweeteners are controversial in balancing sweetness,taste and health.These problems have greatly increased the demand for the development of sweeteners with good properties.In recent years,researchers have begun to use computer-aided methods,such as quantitative structure-activity relationship and machine learning modeling,to explore the relationship between compound structure and sweet.However,there is still much room for improvement in terms of data,methods and applications.This article analyzes the current situation of sweetener development and different demand levels.At the data level,it is hoped that a data system with specific application scenarios will be reconstructed.And the sweet and sweetness of different types of sweeteners are evaluated with the help of current popular machine learning algorithms at the method level.Finally,a multi-layer sweet prediction system was constructed.The specific contents are as follows:(1)The construction of a new data system.First,based on different prediction endpoints,the data with taste labels and sweetness values were collected into Taste DB and LogSw DB respectively.And 2861 and 463 molecules were contained in these two datasets respectively after a data processing process,which was specially formulated.Then,Taste DB was divided into six sub-datasets of natural,artificial,carbohydrate,non-carbohydrate,nutritive and non-nutritive,which contain 1660,1200,458,2631,494,and 2613 molecules according to the actual development needs of researchers.What's more,molecular cloud and PCA analysis,basic data statistics methods,were used to explore the six sub-datasets.The results of structural differences between different datasets show that the construction of our new data system is meaningful.And it is necessary to build complex machine learning models to identify sweet molecules because the category of compounds can't be determined by simple methods.Finally,the new data system was constructed,which consists of seven datasets with different structures and contents.(2)Study on the construction of a multi-layer sweetness prediction system.First,based on the new data system we constructed,the models were constructed by the combination of different descriptors and multiple algorithms,and compared by different evaluation indexes.The accuracy of these classification models used to evaluate the properties of sweet compounds was 0.805-0.934 and the AUC values was 0.920-0.974.For the model predicting the relative sweetness(logSw)of a compound,the R2 value of the test set reached 0.847.The evaluation results show that the prediction performances of the models are good.In addition,the Y-randomization test confirmed that our predictive models are reliable and not random.The seven models of MOE2d-XGBoost,MACCS-RF,Atompairs-XGBoost,MOE2d-XGBoost,MOE2d-XGBoost,MOE2d-XGBoost,and Atompairs-SVR(corresponding to the datasets of natural,artificial,carbohydrate,non-carbohydrate,nutritive,non-nutritive,and logSw)with the best performances were selected through comparison.Then,the above models were localized to build a multi-layer sweetness prediction system.Moreover,the methods of feature selection and matched molecular pair analysis(MMPA)have been used to explore the structural factors that affect the sweetness of molecules in LogSw DB.Through feature importance,we found that the solubility,van der Waals surface area,number of nitrogen atoms,charge,etc.were the main features that affect the sweetness of compounds;80 structure transformation rules that affect sweetness are obtained through MMPA.In summary,this work has achieved the expected goal.A new data system for specific application scenarios of sweeteners was constructed,and a multi-layer sweetness prediction system was built by models that had the best performance.In addition,convenient and practical local models were provided to assist other researchers in their use.We hope that this prediction system can provide an important reference for the screening of sweet compounds and the precise research and development of sweeteners.
Keywords/Search Tags:sweetener, machine learning, prediction, sweetness, molecular cloud, matched molecular pair analysis
PDF Full Text Request
Related items