Font Size: a A A

Differential Analysis And Application Of Biological Data

Posted on:2020-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhouFull Text:PDF
GTID:2370330578463930Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,biological data have accumulated rapidly.Mining the information contained in biological data has become one of the hottest topics in scientific research.It mainly includes the study of the structure and function of molecular sequences,such as nucleic acids,proteins,DNA and genes.Among them,molecular evolution and phylogenetic analysis are both important contents,which aim to explore the evolutionary relationships between species and intra-species through difference analysis of biological molecular.In addition,genetic marker studies of complex diseases based on differential expression analysis of gene expression data help to understand the mechanisms by which cancer occurs,as well as the clinical prognosis and treatment options for complex diseases such as cancer.Through the study of protein sequence and gene expression data types,this discussion was going to explore the effects of differential data of influenza virus and liver cancer data on disease pathogenesis,development,diagnosis,prevention and treatment,as well as treatment effect.The main work of the discussion is summarized as follows:1)The 40-dimensional features of the HA protein sequence of influenza virus was extracted by the physicochemical properties of amino acids,and the optimal cluster number of each year was calculated with the hierarchical clustering method based on the optimal hierarchical evaluation index.Investigate the biological diversity of influenza virus by using the entropy value of each year,and further analyzing the variation of influenza virus by mutation evolution map and population entropy change rate.The results show that the population entropy can reflect the biological diversity of influenza virus well,and the population entropy change rate can also reflect the mutation rate of influenza virus.These studies can provide evidence and support for the influenza prediction.2)Based on tumor genomic map database on the differential analysis of liver cancer gene expression data to distinguish significant differentially expressed genes.And then,calculate the correlation coefficient between the co-expression module and the clinical pathological stage of liver cancer,by using a weighted correlation network algorithm to construct a co-expression module of differentially expressed genes.Then select the gene-interacting gene interaction network in the module with a strong correlation with pathological staging,and choose the module genes with the highest correlation with the pathological T,N,and M stages to perform the enrichment analysis and the pathway analysis in the DAVID database.Comment and visualize molecular interactions by using Cytoscape software.The results show that the abnormally expressed genes in this module play an important role in the cell division,sister chromatid polymerization,DNA repair,and mitotic cell cycle G1/S transformation.Meanwhile,these genes are also enriched in the cell cycle,oocyte meiosis,and p53 signaling pathways.By studying the close-centeredness of the interaction network and the research literature,eight biomarkers of CKAP2,TPX2,CDCA8,KIFC1,MELK,SGO1,RACGAP1,and KIAA1524 are found,and their biological mechanisms are confirmed to be related to liver cancer.Therefore,the abnormal expression of 8 genes can be used as a marker for the pathological diagnosis of liver cancer.3)Use the gene expression data of four clinical pathological stages of liver cancer to express data.The method of difference analysis was chosen to distinguish the differentially expressed genes of different stages.Considering the characteristics of the data expression,logistic regression is selected to find genes with statistically significant effects on the pathogenesis of liver cancer.In the experimental stages of i,ii,iii,and iv,192,149,224,and 112 significant differentially expressed genes are obtained respectively.Furthermore,identify genetic biomarkers in different pathological stages of liver cancer by using molecular interaction network analysis.In the end,use the survival analysis and literature research to verify results.We can see that MELK,KIFC1,CDCA8,RACGAP1 and other four gene biomarkers are consistent with the results of the previous work.Besides,new gene biomarkers such as HJURP,TROAP,NDC80,KIF4 A and COLEC10 are also found.The results show that MELK,HJURP and CDCA8 genes can be used as biomarkers of the i stage,TROAP and NDC80 genes can be used as biomarkers of stage ii,KIF4 A gene can be used as biomarker of stage iii,and RACGAP1 gene can be used as the first staging biomarkers.
Keywords/Search Tags:influenza virus, differential analysis, clustering analysis, molecular interaction network, genetic biomarkers
PDF Full Text Request
Related items