Font Size: a A A

Fundamental Theory And Application Study On Large For Gestational Age Infants Using Machine Learning Techniques

Posted on:2021-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:FAHEEM AKHTARFull Text:PDF
GTID:1484306470471034Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
Large for gestational age(LGA)fetus represents a newborn that is having a gestational weight above the 90 th percentile of a fetus with similar gestational age and sex.Fetus with an excessive gestational weight exhibits severe neonatal and maternal complications.Besides,it tends to complicate delivery and can unfold adverse consequences for the mother and newborn throughout the antepartum period with an expedited chance of infant morbidity and mortality.Moreover,neonates born as LGA are prone to perinatal asphyxia;obesity,and overweight;shoulder dystocia;cardio-metabolic diseases,including hypertension;insulin resistance;and type 2 diabetes;and metabolic syndrome later in life.The mother of neonates with LGA are also at increased risk of cesarean section,prolonged labor,postpartum bleeding and an increased risk of traumatic injuries during delivery;it also has adverse consequences at a later stage in life,like expedited chances of being overweight;and expedited chances of having breast cancer.Therefore,researchers and pediatricians have engaged themselves to establish an efficient and reliable prediction scheme for early diagnosis and prognosis of an LGA fetus with deterministic risk factors.During the last several decades' researchers and pediatricians have established numerous predictive models that require monitoring several different biochemical indicators other than the routine check-up items,and some of them are closely related to pediatrician's expertise and experience.Moreover,most of them were observational or retrospective studies and established on small sample size,which opens up a debate on the reliability,applicability,and practicality of these models in a generalized clinical environment.In response to discussed problem,we obtained the dataset containing 248,501 records with 371 attributes;out of which 230,190(92.6%)were live births;18311(7.4%)were stillbirths,abortions,and miscarriages from "National PrePregnancy Examination Program" China as an LGA dataset.The program was officially launched in 2010 that covered 220 pilot counties in all 31 provinces that include all regions,municipalities,cities,and districts hospitals of China.On the basis of obtained dataset,we conducted a scientific study on infants born as LGA,to identify most deterministic biochemical indicators,to lay down a foundation for the early intervention and prevention of the defined disease using concrete ML techniques that was never exercised before.In addition,feature selection and feature extraction also play a vital role to develop the discussed model.Therefore,our principal focus in this research was on extracting the most suitable features subset,and to recommend a suitable LGA predictive model with the mutual consensus of domain experts.In this research,we also proposed a concrete mechanism that can highlight the importance of data preprocessing techniques;missing value imputation;entertaining imbalanced dataset,detecting and extracting the most suitable features subset with an appropriate predictive model that can help pediatricians and researchers to early intervene and diagnose the disease and understanding LGA in a variety of aspects.In fact,this study will definitely add the original contribution to the scientific society,through establishing a fundamental theory and application on LGA fetus with the selection and extraction of most deterministic risk factors associated.Following are the main contributions of this dissertation.1.To the best of our knowledge,this research is the first,which exploited machine learning techniques to establish an efficient LGA prognosis process on the overall Chinese population that covered 220 pilot counties in all 31 provinces of China which include all regions, municipalities,cities,and districts hospitals of China.Whereas previous studies were limited to a specific region,hospital,and most of them were observational or retrospective studies which used simple statistical tests or linear or multivariate logistic regression for establishing an LGA prognosis process.2.A novel algorithm to handle imbalanced dataset is proposed,as directly applying machine learning techniques to an imbalanced data set can mislead classifiers during the classification task.The proposed scheme helped in improving the classification performance of an LGA and overcome the problem of over-fitting and under-fitting.3.An algorithm is proposed to create a Master Feature Vector(MFV)that overcome the adverse effects of the classification system that may arrive because of data inconsistencies,missing values,and classification related issues.The proposed MFV has successfully elevated prediction performance scores of an LGA and entertained data inconsistencies,issues related to missing values,and data imbalance problem.4.An algorithm to create a semi-supervised feature selection scheme is proposed that includes expert's expertise besides statistical tests to dig-out deterministic features subset of practical use.The proposed scheme was also compared with seven automated feature selection schemes to signify its importance.The proposed scheme with twenty ranked features proved best in terms of performance metrics scores.In addition,by the intersection of common features from seven automated feature selection,it was discerned that eleven features were from the twenty features selected by experts which were developed using expert's expertise in conjunction with a statistical test 5.Several supervised,semi-supervised,and unsupervised feature selection and extractionschemes were proposed to identify suitable risk factors for the development of an efficientLGA prognosis process.It includes expert-driven feature selection scheme,Cluster-basedFeature selection scheme(CFS),several automated feature selection schemes(i.e.,Information Gain,Chi2 Square,Pearson Correlation,Stepwise Logistic Regression,Random Forest with Gini Index,and Boruta),Grid Search based Recursive Feature Elimination with cross-validation(RFECV)feature selection scheme,Grid Search based RFECV + Information Gain(IG)feature selectionscheme,and Grid Search based RFECV + IG + Stacking feature scheme.The highestprediction precision,recall,accuracy,Area under the Curve(AUC),specificity,and F1 scoresof 92%,87%,92%,95%,95%,and 89% respectively are obtained using Grid Search basedRFECV + IG + Stacking feature scheme with ranked ten features subset,which is suggestedas the best model for the development of an efficient,reliable,and accurate LGA prognosis process with less computational overhead.
Keywords/Search Tags:Large for Gestational Age, Data Preprocessing, Feature Selection, Feature Extraction, Data Imbalance, Missing Values, Expert Driven Feature Selection Scheme, Automated Feature Selection Schemes, Stacking, Prediction Model
PDF Full Text Request
Related items