Font Size: a A A

Analysis Of Semi-Supervised Learning Algorithm Oriented Disease Prediction

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y PanFull Text:PDF
GTID:2404330572951503Subject:Cryptography
Abstract/Summary:PDF Full Text Request
In recent years,clinical big data has drawn a wildly public attention.It is of great importance to utilize massive clinical data to mining its internal meaningfully information,and then predict the disease or even cancer which can help the patient ahead of time.From the starting point of predicting cerebral pslsy,this thesis attempts to find a learning algorithm of certain disease prediction through big data mining,which has higher precision and lower total loss of model.This thesis focus on semi-supervised learning algorithm,and improving the model aming at disease prediction.The research achievements are as follows:1.Design a set of data preprocessing methods aming at clinical data set.Because clinical data could not directly analyse and establish mathematical modeling.Firstly,this thesis cleans the dirty data by filling missing value,processing inconsistent data,detecting outliers.Secondly,this thesis does object matching and pattern integration for different data sets from different clinical databases,then does feature selection based on redundancy and correlation analysis.Lastly,this thesis does feature scaling and dimensionality reduction.After that,experiments are designed to prove that this set of data preprocessing methods can effectively provide data availability and improve computational efficiency during model analysis.2.Due to the characteristic of few labeled data and large amounts of unlabeled data in training dataset,this thesis apply semi-supervised learning algorithm to cerebral pslsy prediction and other disease prediction.It is known that supervised learning algorithms have applied to disease prediction,this thesis compares three supervised learning algorithm and their corresponding semi-supervised learning algorithms,including gaussian mixture model,support vector machine,graph model,semi-gaussian mixture model,semi-support vector machine,semi-graph model,then theoretical derivation and skill contrastive analysis have down.By doing a set of experiments on these algorithm models using eight datasets and three labeled proportion,three kinds of evaluations' mean value verify that semi-supervised learning algorithms show performance advantage in predicting cerebral pslsy and other disease.At the same time,semi-support vector machine show best performance.3.Research on three problems aming at clinical prediction,this thesis proposes an advanced learning model based on semi-support vector machine.Firstly,based on unequal loss problem from positive misclassification and negative misclassification,this thesis sets different misclassification weights,especially increase positive-misclassified punishment,in order to reduce total loss.Secondly,based on extremely class-imbalanced dataset,this thesis makes constraint for positive samples' quantity proportion in unlabeled dataset in order to make its proportion closer to real labels' distribution.Thirdly,based on the influence to objective empirical risk function from label-imbalance,this thesis scales the empirical risk function for labeled samples and unlabeled samples respectively,in order to optimize the prediction error.After theoretical derivation,a set of contrastive experiments are designed to verify that these three improved steps optimize classification precision,and reduce total loss at the same time.
Keywords/Search Tags:data preprocessing, semi-supervised learning, unequal misclassification loss, imbalance dataset, label-imbalance
PDF Full Text Request
Related items