| The rapid development of the modern food industry inevitably leads to consumers being exposed to a variety of food additives through their diet in a long-term,low-dose,and repetitive manner.This protracted ingestion and exposure may potentially culminate in various chronic toxic effects on human health.The present development of food additives in the food industry is still predominantly dependent on conventional serendipitous discoveries and extensive experimental evaluations.Owing to the absence of safety and stability prognostication in the initial phase of research and development,the repetitive,timeconsuming,and high-cost trial-and-error experiments in the later phase substantially impede the efficacy and success rate of novel food additives.This study adopts a novel perspective and employs advanced artificial intelligence techniques to explore innovative assessment methods for the safety and stability of food additives,utilizing various molecular representations and machine learning algorithms to capture the structure-activity relationships between chemical structures and a series of safety and stability indicators,ultimately constructing accurate predictive models.The aim is to provide valuable tools for the virtual screening of the safety and stability of food additives,specifically as follows:(1)Prediction of food additive stability:the stability of food additives was described in two aspects:their important physicochemical properties and thermal stability.Firstly,we manually collected datasets of solubility(logS),boiling point,melting point,exothermic decomposition,and exothermic onset temperature(To)from literature and databases.We then designed data processing workflows tailored for each dataset,ultimately obtaining datasets containing 5220,5758,9383,744,and 377 unique structures and their true labels,respectively.Secondly,the highest temperatures commonly employed in heat treatment processes in food processing were used as thresholds to divide the To dataset into positive and negative sets.By combining multiple molecular representation methods and advanced machine learning algorithms,we constructed models and identified the optimal model through predictive performance and robustness comparisons.Finally,we successfully developed six predictive models for the important physicochemical properties and thermal stability of food additives,achieving efficient prediction of food additive stability.(2)Prediction of food additive safety:we manually collected and compiled safety endpoint data from various sources,and designed appropriate data processing workflows according to the model task to obtain datasets containing structure information and endpoint values for the no observed adverse effect level(NOAEL),half-lethal dose(LD50),AMES test,and bioconcentration factor(BCF).Regression and classification models were established for sub-chronic and subacute NOAEL.Based on descriptor and substructure fingerprints,we found that oxygen atoms,oxygen functional groups,sulfur atoms,heterocycles,and aromatic rings are crucial in the NOAEL model on the test set.In addition,species and exposure durations extrapolation models were built based on NOAEL under multiple experimental conditions.Finally,a comprehensive NOAEL prediction solution was developed.To further enrich the safety assessment system,we conducted predictions of LD50,AMES and BCF indicators which characterize acute toxicity,mutagenicity,and environmental hazards.The top-performing models achieved accuracy scores exceeding 0.82 on the independent test set.In summary,this work successfully established a virtual evaluation system for the safety and stability of food additives based on machine learning methods,which includes 13 robust machine learning models.Moreover,we have fully validated the effectiveness and practicality of the models through cross-validation,external test,and Y-randomization test.Representative examples were selected to elaborate and demonstrate the potential value of the virtual evaluation system in pre-assessment and screening of additives by comparing and analyzing the differences between the prediction results of this system and the actual application scenarios. |