Font Size: a A A

Research And Development Of Feature Optimization Algorithm For Heterogeneous Health Big Data Diagnosis And Treatment Model

Posted on:2020-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X FengFull Text:PDF
GTID:1364330575481195Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The advent of the era of health big data has greatly promoted the establishment of medical diagnostic models,achieving faster,better and more accurate precision medicine,but at the same time bringing new computing time and learning efficiency to machine learning and data mining researchers.challenge.The biomarker detection problem in the medical field is equivalent to the feature selection problem in the field of machine learning.Health big data mining is an important research direction of big data mining technology,and it is a research hotspot in the computer field and medical field.The inherent "large p small n" of biomedical data,the high degree of correlation and the intelligibility requirements of medical researchers for data mining results make traditional data mining algorithms difficult to directly apply to bio-health big data mining tasks.The so-called "large p small n" means "high-dimensional small sample" means that the existing data set has many features and very few samples.In this case,feature selection is usually adopted to reduce a large number of phenotype-independent biomedical features,thereby reducing the time the model runs,reduces the time and space complexity of the model,and achieves a stable classification or regression model that is independent of the specific data set,improving the generalization ability of the model.In this paper,a multi-level integrated modeling algorithm and feature fusion research of heterogeneous health big data are carried out for the above problems.Because bioomics,imaging omics and electronic medical record data describe the state of different time and spatial scales of biological systems,they have significant heterogeneous and multimodal characteristics and are the main source of biomedical information for medical modeling.Therefore,feature extraction,feature selection,and data fusion algorithms are studied for three categories of health big data.Classification and regression models are established,and a big data visualization system is developed.The main research contents are as follows:Aiming at the problem of bio-group data classification,this paper puts forward the idea of hierarchical classification modeling and studies the detection of biomarkers in breast cancer and autism.In the study of breast cancer,the transcriptomics and methylation datasets of breast cancer were divided into multiple age groups.By applying individual models to patients of different age groups,the accuracy of the disease diagnosis model was improved.The importance of age for breast cancer staging modeling;for autism research,using methylation data from peripheral blood samples,using a descending-based feature screening strategy,found 678 optimal methyl groups associated with autism detection Biomarkers for omics.Aiming at the regression problem of bio-group data,a new feature selection algorithm with regression optimization goal was proposed.The biomarkers related to the stage of cancer staging were found by regression method,and the accurate prediction of cancer staging markers was made.At this stage,many continuous value phenotypes such as the continuous relationship of cancer development stages are ignored.By comparing the goodness of fit,classification accuracy and other evaluation indicators,the proposed regression biomarker detection algorithm is superior to the existing 10 biomarker detection algorithms.Aiming at the medical image data,a Triz algorithm with rotation invariance and a REDE algorithm for identifying fatigue state are proposed.The Triz algorithm implements computer-aided diagnosis of gastric disease identification,which can effectively identify the four classification problems of gastric polyps,gastric cancer,gastric ulcer and normal disease-free.The REDE algorithm integrates the eye and mouth morphological features of the face region,and studies and discusses the fatigue detection problem from the aspects of feature number,classifier and modeling parameters.The experimental results show that the REDE algorithm has fatigue detection accuracy and running time.It is superior to the recently disclosed four fatigue detection algorithms.In addition,a Python-based health-related image visualization engineering system,pyHIVE,was developed.The validity and generalization of pyHIVE was verified in the Outex texture database and the Salvador gastrointestinal video endoscopy database.Aiming at the clinical electronic medical record data,this paper puts forward the multi-classification integration modeling idea of the tumor area with chemotherapy,and uses the personalized regression model to process the data of the three previous masses of the tumor,and realizes the prediction of the data of the last three masses.The threepredicted mass area was used to refine the clinical breast cancer neoadjuvant chemotherapy pathological complete remission(pCR)classification model,achieving a higher pCR prediction accuracy.Finally,the health big data visualization system,kSolutionVis,was developed and the fusion multi-omics data modeling was explored.The results of the visualization of big data can help medical researchers understand the results of data mining and discover new laws and algorithms.kSolutionVis provides a user-friendly graphical interface that assists biomedical research in detecting multiple feature subsets,paving the way for biomedical researchers to explore multiple solutions for biomarker detection.After modeling the feature selection algorithm of bio-omics data,the modeling analysis of fusion multi-omics data was explored,which proved that the multi-group fusion classification effect is better than the single-group classification effect for the multiclassification problem of breast cancer staging.
Keywords/Search Tags:health big data, feature selection, bio-group data, medical image data, electronic medical record data
PDF Full Text Request
Related items