Font Size: a A A

Genemicroarray Missing Value Estimation Technology

Posted on:2014-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:F C MengFull Text:PDF
GTID:2250330401973733Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Gene microarray technology has been widely used in various biological experiments andbecomes one important research branch in bioinformatics. Gene microarraysare in the form oflarge matrices, however, due to the limitations of microarray experiments, these matricesoften suffer from missing values, and as a result, how to estimate these missing values hasbecome one research hotspot in bioinformatics.Current gene microarray estimation methodshave only utilized single feature of the dataset (global homogeneity or local heterogeneity), totackle with this problem, this paper analyses the performance of sparse representation inestimating gene microarray missing values, proposes three novel estimation methods based onlocal least squares (LLS) and Bayesian principal component analysis (BPCA), and proposesan automatic method selection model. The main research contents and results are as follows:(1) Investigates the performance of sparse representation in the LLS framework.Analyses the low-rank structure of the data by calculating singular value decomposition (SVD)of the dataset. The result has shown that the low-rank structure of gene microarray data doesnot meet the demand of sparse representation, thus gene microarray data is improper to beestimated by using sparse representation.(2) Solves matrix completion (MC) problem by adopting inexact augmented Lagrangemultipliers (IALM) and obtains a complete matrix. The complete matrix can replace therow-average matrix in LLS. This technology can make full use of the global correlation andlocal similarity structure of the dataset. Compared with BPCA and LLS, it can obtain lowerestimation error.(3) Performs BPCA on the neighborhood consisting of K nearest neighbors of the targetgene. This scheme can avoid the drawback of BPCA that does not perform well on datasetwith high local similarity structure, and has obtained lower estimation error on all kinds ofdatasets (time series, non-time series and mixed dataset).(4) Identifies the most correlated genes (rows) and experimental conditions (columns)with the missing entry and form a bicluster, then performs BPCA on the bicluster. Thismethod possesses advantage in presence of high missing rates, but has a drawback of high computational cost.(5) By estimating simulated missing values using different methods, proposes anautomatic method selection model. Validation test has shown that this model can choose themethod that has the lowest estimation error, which is helpful in choosing different estimationmethods for a given dataset.
Keywords/Search Tags:gene microarray missing values, sparse representation, Bayesian estimation, leastsquares, biclustering
PDF Full Text Request
Related items