Category Encoding Method To Select Feature Genes For The Classification Of RNA-seq Data

Posted on:2021-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2494306131981959

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Micro-RNA-seq and single cell RNA-seq(scRNA-seq)data have become an important basis for biological and medical research.It has attracted extensive attention of researchers to select characteristic factors from a large number of gene expression data for classification research.It is an effective method to use micro RNA-seq or sc RNA-seq data to diagnose disease types in medical research.Aiming at the above sequencing data,there are statistical classification methods such as Poisson linear discriminant analysis(PLDA),negative binomial linear discriminant analysis(NBLDA)and zero expansion Poisson logic discriminant analysis(ZIPLDA).Because the number of gene expression is thousands,the sample is only dozens,in a large number of genes,not all genes play a role in classification,there are a large number of redundant and unrelated genes in gene expression data.A typical method of gene expression data processing is to select feature genes.How to find and select genes that play a decisive role in sample classification is very important for the subsequent classification work.In order to enhance the accuracy of classification,save computing time and improve computing efficiency,it is necessary to remove irrelevant genes and detect important feature genes.At present,BSS/WSS method is widely used,but this method assumes that the data are normal distribution,so it may not be suitable for micro RNA-seq and sc RNA-seq data.To solve these problems,this thesis proposes a method of encoding categories and selecting differentially expressed genes by using the Spearman correlation coefficient.The correlation coefficient reflects the direction and degree of the change trend between the two variables,and the Spearman correlation coefficient is a statistical measure of the strength of monotonic relationship between paired data.We recode the class number of the samples in each class according to the size of the sample observations in each class,and get the new class number code in each class.By calculating the correlation coefficients of genes and new category numbers,the genes with larger correlation coefficients are selected,so as to ensure that the differences between the selected genes in the class are small,while the differences between the classes are large,which improves the efficiency and accuracy of classification.At the same time,we prove the screening certainty and rank consistency of the proposed ENTCmethod.We compare the ENTC method with the existing method of selecting feature genes.Simulations show that in many cases,the accuracy of selecting feature genes by ENTC method is higher than that by other methods,and the misclassification rate for classification is lower than that by other methods.In addition,by analyzing the actual data,the results also show that the ENTC method is better than other existing methods.

Keywords/Search Tags:

Classification, Differential expression, Feature genes, RNA-seq data, The correlation coefficient

PDF Full Text Request

Related items

1	The Research Of Gastric Cancer Feature Genes Selection Based On Gene Expression Data
2	An Integrated Prediction Method For Cancer Classification Based On Gene Expression Data And MiRNA Expression Profiles
3	Health And Medical Care Statistics Data Publishing Technology Based On Differential Privacy
4	Studies Of Tumor Information Mining Algorithms Based On Multiple-omics Data
5	Study On Characteristic Genes Of Pancreatic Cancer Classification Based On Multiple Data Sets
6	Classification Of Cancer Subtypes Based On Gene Expression Data
7	Research On Tumor Feature Gene Selection Method Based On DNA Microarray Data
8	The Research Of Cancer Feature Genes Selection Based The Gene Expression Data
9	Research On Feature Selection And Classification Method Of FMRI Data Based On Statistical Information
10	Research On Classification Method Of FMRI Data Based On Broad Learning System