Font Size: a A A

Research On The Establishment Of Prediction Models Of IVF-ET Treatment Outcomes And Analysis Of Prediction Characteristics Based On Random Forest Algorithm

Posted on:2023-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:1524306806455324Subject:Cell biology
Abstract/Summary:PDF Full Text Request
More and more elderly women will face fertility problems with the China introduced an overall two-child and three-child policy.The infertility rate is about 1/7-1/8 in people of childbearing age and it is increasing year by year.As the main method for the treatment of infertility,IVF-ET brings hope for the majority of infertile couples to conceive their own offspring.The success rate of IVF-ET treatment has been significantly improved as the development of assisted reproductive technology,but it still can not guarantee 100%.In fact,only about 1/3 of patients can obtain live birth,and most patients still end in failure after one or even several treatments.The outcomes of IVF-ET treatment are affected by many factors.At present,the impact of various factors on the outcomes are unclear.Clinically,the prediction of the treatment success rate of IVF-ET patients is mostly based on the patient’s age and the previous average success rate of the reproductive center,and the prediction accuracy is poor.Random forest algorithm has the characteristics of simple principle,easy implementation,and small amount of calculation.It is an integrated algorithm with good prediction effect and high classification accuracy.In particular,the Bagging technique can effectively solve the problem of over-fitting of complex models,and at the same time,the feature importance score can be calculated,and the importance of model prediction features can be ranked.Therefore,with the help of the random forest algorithm,to explore the main features affecting the treatment outcomes of IVF-ET and establish a prediction model for individualized treatment outcomes based on the patients’ own characteristic informations will be of great significance of clinical diagnosis and treatment and patient consultations.Objective:The purpose of this study was to establish the prediction models for cumulative clinical pregnancy and cumulative live birth of IVF-ET patients by using random forest algorithm,according to the medical data of patients before and after the cycle.At the same time,the patients’ features predicting cumulative clinical pregnancy and cumulative live birth were ranked in importance and the main features affecting the treatment outcomes would be found out based on the advantage that the importance score of the included features can be calculated by the random forest algorithm.So as to provide reference to clinical diagnosis and treatment.Methods:(1)The infertile couples treated with IVF-ET in the reproductive medicine and prenatal genetics center of the first hospital of Jilin University from July 2015 to December 2019 were collected from July 2015 to December 2019 were collected.According to the corresponding screening criteria,a total of 4249 IVF-ET treatment cycles in 3841 couples were included,and 38 basic features including age were collected.Whether to obtain cumulative clinical pregnancy and cumulative live birth were used as prediction labels to establish data sets.(2)Data preprocessing and characteristic analysis were as follows:Grouped according to the prediction labels.According to the unbalanced characteristics of the samples after grouping,the random upward sampling method was used to balance the samples,and the balanced data sets were constructed.SPSS 23.0 software was used to compare the differences between groups,and the features with statistical differences were selected.Calculated the Pearson coefficient between features,so as to screen the highly linearly related features to remove the multicollinearity between features.The supervised discretization method was used to discretize the continuous features in the data sets,and all prediction features were segmented and assigned.(3)Based on the prediction objectives,prediction models for IVF-ET treatment outcomes prediction models were constructed by balancing the pre-and post-processing data sets.The data sets were divided into training sets and test sets according to the ratio of 8:2 in order to test the performance of the prediction models.The prediction models on cumulative clinical pregnancy and cumulative live birth before and after cycle advanced were established by using the random forest algorithm of Matlab software.Calculated the accuracy,recall,specificity,precision,F1 measurement value and other performance evaluation indexes of each model,draw the receiver operating characteristic curve and calculated the area under the curve.(4)The features’ importance score of the random forest algorithm was counted,and the predicted features in each model were ranked according to the score,and the key features and main features of IVF-ET treatment outcome prediction were obtained by analysis.Results:(1)A total of 4249 cycles were included in the original data set.The cumulative clinical pregnancy rate was 70.79%and the cumulative live birth rate was 64.06%.The "pregnancy balanced data set" and "live birth balanced data set" were constructed after balancing the samples.Both the rate cumulative clinical pregnancy and cumulative live birth were 50%,and the samples reached to balance.(2)A total of 19 features related to treatment outcomes were selected for the construction of random forest prediction models,including 14 features generated before the cycle and 5 features generated after the cycle.These features were the number of cycles,female age,type of infertility,years of infertility,causes of infertility,delivery history of the woman,males’ hepatitis B result,sperm concentration,PR,female BMI,basic FSH,basic FSH/LH,basic E2,AMH,ovulation stimulation protocol,dosage of Gn,Gn days,number of eggs retrieved and number of available embryos.(3)Comparing the prediction effects of different models on cumulative clinical pregnancy,the performance index of the post-cycle pregnancy balance set model was the best,followed by the pre-cycle pregnancy balance set model,next the post-cycle original data set model,and finally the pre-cycle original data set model.The AUCs of the four models on the test set were 0.9671,0.8926,0.8735 and 0.7001,the recall,ie the sensitivity,were 81.70%,78.37%,91.18%and 90.85%,and the specificity were 96.17%,83.19%and 62.50.%and 38.17%.(4)In the pre-cycle cumulative clinical pregnancy prediction model,the importance of 14 included features was ranked,and the female age and AMH were the two key features for predicting pregnancy.Basic FSH,basic FSH/LH,basic E2,years of infertility,and causes of infertility were the main features for predicting cumulative pregnancy.(5)In the post-cycle cumulative clinical pregnancy prediction model,the importance of 19 included features was ranked,and the number of available embryos and the female age were the two key features for predicting pregnancy.Five features,including number of oocytes retrieved,AMH,Gn days,Gn dosage,and infertility years,were the main features for predicting cumulative pregnancy.(6)Comparing the prediction effects of different models on cumulative live birth,the performance index of the live birth balance set model after the cycle was the best.The performance of the original data set model after the cycle and the live birth balance set model before the cycle were basically the same.The original dataset model before the cycle had the worst performance.The AUCs of the four models on the test set were 0.9386,0.8342,0.8316 and 0.6906,the recalls were 80.88%,85.11%,75.92%and 82.90%,and the specificities were 90.44%,65.57%,73.11%and 46.89.%.(7)In the pre-cycle cumulative live birth prediction models,the importance of 14 included features was ranked,and the female age and AMH were the two key features for predicting live birth.Basic FSH,causes of infertility,years of infertility,and basic FSH/LH were the main features for predicting cumulative live birth.(8)In the post-cycle cumulative live birth prediction models,the importance of 19 included features was ranked,and the number of available embryos and the female age were the two key features for predicting live birth.Seven features,including number of oocytes retrieved,AMH,sperm concentration,female BMI,years of infertility,Gn days,and Gn dosage,were the main features for predicting cumulative live birth.Conclusions:(1)The sample balance method of random upward sampling was adopted to balance the unbalanced original data set collected by the research,which increased the number of learning times of the random forest prediction model for minority samples,and improved the prediction effect of the model on minority samples,thereby improving the overall prediction performance of the model.(2)Post-cycle features such as the number of available embryos played an important role in the outcomes of IVF-ET treatment.Adding post-cycle features could further improve the prediction performance of the model.(3)Before the cycle,the female age and AMH were the key features to predict the outcomes of IVF-ET treatment.Four features,including basic FSH,cause of infertility,years of infertility,and basic FSH/LH,were the main features to predict the treatment outcomes.(4)After the cycle,the number of available embryos and the female age were the key features to predict the outcomes of IVF-ET treatment.Five features,including the number of oocytes retrieved,AMH,infertility years,Gn days,and Gn dosage,were the main features to predict the treatment outcomes.
Keywords/Search Tags:IVF-ET, prediction model, random forest, cumulative clinical pregnancy, cumulative live birth, imbalanced samples
PDF Full Text Request
Related items