Multi-platform Microarray Data Integration In Tumor Classification

Posted on:2018-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:L Lu

Full Text:PDF

GTID:2334330542960055

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent decades,tumor disease has always been one of the diseases which is difficult to cure we are facing,one seriously endanger the health of mankind,and has not yet found a targeted treatment.If you want to treat cancer,first of all to correctly determine the tumor subtype,with bioinformatics professional terms is the tumor classification.The emergence of high-throughput technology directly led to the emergence of a large number of tumor data,including different platforms and different laboratory data.The production of these data requires a high cost,but there are many small samples and large genetic problems,and the data of single-platform or single-laboratory data may be one-sided and unreliable.Therefore,this paper researches the following two aspects aiming to the problems of a single data and poor classification of tumor.(1)After analyzing the current mainstream data integration algorithm,it is found that Combat fusion algorithm is excellent in all evaluation indexes,especially in small samples(<25).However,this study found that the Combat fusion algorithm moves all batches around the overall average,so that when the combat transforms data using a validated,fixed gene tag,the gene labels on those data will also be shifted.For this problem,this paper will adopt the method of using a single batch to find the mean,variance,and then cycle to select the batch to be fusion as the reference sample to select the best batch as the reference sample instead of using the traditional overall mean and the population variance to adjust.In addition,this study found that the current fusion algorithm does not take into account the characteristics of the sample itself in the batch,but rather processes the entire batch directly.In order to solve this problem,this paper will divide a batch according to the first principal component(F1)of the feature,select a threshold k to divide a batch into two parts(greater than the threshold part is k+,less than the threshold part is k-)and do the same for other batches.Then it will fuse parts of the batches that are larger than the threshold,fusing parts of the batches that are less than the threshold and fusing two batches which are divided finally.Through the comparative experimental analysis,it is proved that the improved method in this paper has a very good performance compared with the other four fusion algorithms.(2)On the basis of studying the fusion algorithm,this paper found that applying the data which is after fused to many fields can improve the effects,such as the selection of specific genes,the control network and so on.Aiming at the problem that the current classification of tumor is not high,this paper presents a framework for tumor classification.Firstly,it analyzes the classification of a single data set,and then improves the accuracy of tumor classification through the data fusion.At the same time,this paper presents a simple comparison method aiming at the problem that can not be directly compared to a single data set and a combination of several data sets because of the inconsistency of the sample size.Finally,this paper carries out two groups of experiments on real breast cancer data,and classifies them by using the common classification algorithm of machine learning,which verifies that the tumor classification based on cross-platform data fusion has a good effect.

Keywords/Search Tags:

Tumor classification, Cross platform data fusion, Combat, Machine learning

PDF Full Text Request

Related items

1	Tumor Gene Expression Data Classification Based On Manifold Learning
2	Processing And Analysis Of Gene Expression Data Based On Machine Learning Algorithm
3	Research On Analysis Method Of Tumor Gene Expression Data Based On Machine Learning
4	Multi-feature Fusion Based On Deep Transfer Learning For Classification And Recognition Of Microscopic Cell Images
5	Research On Medical Data Classification Algorithm Based On Machine Learning
6	Research On Medical Data Classification Based On Machine Learning
7	Research On Multimodal Image Classification Based On Machine Learning
8	Tumor Gene Expression Profile Data Mining Based On Machine Learning And Intelligent Optimization
9	Research On Missing Modality Image Classification Based On Machine Learning
10	Research Of The Reliable Platform Of Distributed Machine Learning For Medical Data Based On Blockchain