Font Size: a A A

Research And Application Of Missing Data Imputation Method Considering Biological Regulation Mechanism

Posted on:2022-07-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S DongFull Text:PDF
GTID:1484306740463854Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background Biomedical research is moving to a multi-omics era.Generally,the development of disease is result from environment exposure,genetic variation,and genetic-environment interaction.Single-omics study is limited in describing the molecular mechanism of disease,which driving a need for multi-omics research.However,it is struggling to have multi-omics datasets simultaneously for a single sample,and stitching together multi-omics rarely creates a complete dataset.This so-called “block missing” phenomenon during multiple omics data integration hinders adaption of traditional analysis procedures.Deleting incomplete records is a routine and easy-to understand method but,not surprisingly,considerably reduces statistical power.Aim To impute block missing dataset efficiently,we proposed a trans-omics block missing data imputation(TOBMI)method in Part I,which can be used to impute block missing gene expression matrix using external information obtained from DNA methylation datasets.Next,we seek to make methodological improvement on TOBMI and realize multiple imputation frameworks.Methods Our proposed method,TOBMI,is based on k-nearest neighbor(kNN),which will be used to searching similar individuals for block missing samples in external matrix,like DNA methylation matrix.Then,we improved TOBMI algorithm based on Markov Chain Monte Carlo(MCMC),random forest(RF),and Linear stepwise regression(LR).We went over the imputation performance with comparisons of TOBMI,Mean imputation,Missing case deletion,and Multi-hot deck imputation.A series of TOBMI algorithm will be used to impute block missing datasets for glioma and acute respiratory distress syndrome.Results TOBMI algorithm is superiority over Mean imputation,Missing case deletion,and Multi-hot deck imputation.The modified TOBMI,TOBMI-MCMC,is superiority over TOBMI-RF and TOBMI-LR.However,TOBMI-MCMC is time consuming.TOBMI-RF ranks only second to TOBMI-MCMC,and the imputation process is time-saving.Conclusion Our study shows that TOBMI algorithm can be utilized as a developed algorithm for block-missing data imputation.
Keywords/Search Tags:multi-omics, block missing, DNA methylation, gene expression, multiple imputations
PDF Full Text Request
Related items