Font Size: a A A

Screening Of Differentially Expressed Genes In Two Sample Problem Research

Posted on:2013-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2240330374487588Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
To identify differentially expressed genes (DEGs), this paper has proposed a new method considering the multidimensional structure of microarray data. It penalizes the previously rejected genes. The genes are called DEGs according to the frequency of the occurrence. Two groups of data are analyzed in this paper with our method, the usual SAM (Significance Analysis of the Microarray) method and the Efron Bradley’s method.In the simulated data whose first20variables are DEGs, SAM has identified17variables which are all objective ones and three ones (12th,18th and19th) are unidentified. On the condition of fdr<0.2, Efron’method identifies18variables. The unidentified are the12th and18th. In our method, the first17variables are all objective ones. The19th、18th and12th variables are in the19th、22nd and24th places respectively. Beyond the objective ones,4variables that are841st、371st、140th and492th appear in our result. According to the p-value of the t-statistic, those4variables’ p-values are less than the12th one and18th one. The other18variables’ p-values are listed in the first18. All of these verify the reasonability of our method.When a penalized factor is added to the probability of the rejected variable, the objective variables occur34times accumulatively and only3variables don’t appear. All of the12ones which occur more than once are all objective variables. On the other hand, without a penalization, the unidentified reaches9and only5out of7variables which occur more than once are objective ones.The other group is the leukemia data set of Golub. SAM identifies76genes. Efron’method identifies6genes when fdr<02, but23ones when fdr<0.4. We select74genes whose frequencies are larger than38.41out of these71genes appear in the result of SAM. X95735_at、 M27891at、M23197at、Y12670at and M16038at rank in the first5places and their p-values rank in the first6places.8genes of the first20ones appear in the result of the Efron’s method. When a penalized factor is added,70%genes which occur more than once are listed in the first20, and7genes appear in the Efron’s method. Without a penalization, there are21genes with occurrence more than once.Only38%of them are listed in the first20and6genes appear in the result of the Efron’s method.When SVM classifier is established with the SAM’s identified genes, the error is0and the number of support vectors (SVs) reaches31While with the Efron’s genes identified on the condition of fdr<0.2, the error is2.63%and it has19SVs. When fdr<0.4, there is no classifying error but21SVs. Furthermore, the same analysis has been done on the first20genes identified by our method. The error is0too, but the amount of the SVs is only12.The analysis shows that our method not only identifies objective genes more exactly, but also ranks the levels of the genes’ differential expressions. Adding a penalized factor can improve the exactness. In the Golub data, the SVM classifier’s error is0and the number of SVs is the least.
Keywords/Search Tags:differentially expressed, feature selection, supportvector machine
PDF Full Text Request
Related items