Objective:By comparing the accuracy and generalization of three machine learning prediction models based on relative expression levels of 35 gene indexes related to skeletal muscle injury repair,selecting the best mathematical model for predicting skeletal muscle injury time based on multidimensional nucleic acid index.It provides a new way of thinking and direction for machine learning algorithm to establish mathematical model to infer wound time in forensic practice.Methods:A total of 65 Sprague–Dawley rats were divided randomly into a control group and contusion groups at 4,8,12,16,20,24,28,32,36,40,44,and 48 h post-injury(n = 6 per group).A counterpoise fell freely through a clear Lucite guide tube onto the right posterior limb causing the skeletal muscle contusion.The expression levels of the target mRNAs were calculated using the statistical model(1 + Eff.)-△△Ct,normalized with the geometric mean of the reference gene(RPL13 and RPL32 mRNAs)levels.Multivariate statistical analysis of the relative expressions of 35 genes at different points in time and the establishment of mathematical models were performed by using Python to analyze the feasibility and accuracy of different machine learning methods for estimating wound time.Classification labels were determined by linear discriminant analysis(LDA).Application of recursive feature elimination algorithm,joint Logistic Regression model,the Random Forest model and Multinomial Naive Bayesian model three kinds of supervised classification index of conjoint analysis mathematical model,the Logistic Regression model prediction by calculation results compared with the threshold classification,Random Forest model and Multinomial Naive Bayesian model is divided into categories by algorithm of classification model,different machine learning methods are verified through the internal one approach to injury time inference accuracy.An additional 13 rats were added as the external verification group(randomly divided into the control group and the injury group,1 rat in each group),and the relative expression of mRNA was detected by the same method,and the data was brought into the established mathematical model analysis model Generalization capabilities.Results:The 35 genes which were involved in wound healing were differentially expressed after skeletal muscle contusion in rat,showing a good correlation of wound age.Python language was used to establish the three models of wound time inference.Among them,Recursive Feature Elimination(RFE)combined with Logistic Regression(LR)was used to establish a simpler classification prediction model(25 genetic features).The validation accuracy of the model is 100%,and the prediction accuracy is 92%.The areas under the ROC Curves(AUCs)of the receiver operating characteristic curve(ROC)are 0.99 and the F1 score is 0.92.The validation accuracy of the Random Forest model is 85%,and the prediction accuracy is 77%.The areas under the ROC Curves(AUCs)of the receiver operating characteristic curve(ROC)are 0.92 and the F1 score is 0.84.The validation accuracy of the Multinomial Naive Bayesian model is 62%,and the prediction accuracy is 54%.The areas under the ROC Curves(AUCs)of the receiver operating characteristic curve(ROC)are 087 and the F1 score is 0.36.Conclusion:In this study,combined with mathematical model and machine learning algorithm and the expression of 35 genes at different time after injury,we established three mathematical models.Through feature selection and comparative optimization,we find that compared with single index,joint multi-index inference of damage time is more accurate and perfect,and machine learning mathematical model improves the accuracy and objectivity of wound time inference.Among the three supervised models,logistic regression model has higher accuracy,more accurate prediction ability to unknown samples,and simpler genetic characteristics.Based on the temporal changes of related genes after injury,this model is more suitable for the prediction of early injury time.The application of machine learning algorithm to establish mathematical model for multi-index multidimensional data analysis is convenient and provides a new research idea and method for forensic science wound time inference. |