Font Size: a A A

Visual Analysis Of Gene Data

Posted on:2019-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiuFull Text:PDF
GTID:2370330593951088Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the focus of human genome research has shifted to functional genomics,the focus of bioinformatics research has quietly shifted from the accumulation of biological data to the processing and extraction of biological data.Traditional biological information technology often uses artificial intelligence,machine learning,data mining and other methods to collect,process and utilize biological data.With the deepening of functional and gene expression of genes and proteins,the amount of gene expression data increases exponentially,and how to analyze the massive and high-dimensional genetic data and effectively apply biological information from the global to the local Interpretation,understanding,assessment and reasoning have become very important and urgent issues in the field of biological information.In this paper,we develop a target gene mining algorithm based on gSpan.The algorithm aims to mine the target gene from massive gene data.Gene data is usually divided into experimental group and control group.Based on the fact that the target gene should have small differences in same group and large differences in different groups,we use variance to select gene.After that,we calculate the correlation coefficients of two samples in same group.For those two samples that have small correlation coefficients,we find the gene fragments that have small difference.We also calculate the correlation coefficients between two samples of different groups,and for those two samples that have large correlation,we find the gene fragments that have large difference.Finally,we put all these gene fragments found in last step into one set.For this set of gene data,we use the mutual information to calculate the correlation between gene fragments.Thus we organize every sample gene data into a graph,in which gene fragments are nodes and their correlations are edges.Then we apply the gSpan(Graph-based substructure pattern mining)algorithm to the graph for subgraph mining and matching.The resulting subgraph is the experimental result.Existing gene visualization analysis mainly uses visualization as a demonstration of the final result or has developed a visualization framework.There is little use of visual tools to assist decision making.In this paper,not only the grayscale is used to visualize the genetic data to visualize the results of the algorithm,but also the visual methods such as line chart and grayscale are used to assist the analysis and algorithm decision-making.Our experimental data set is the genetic data collected for human dental caries.After verification,the results show that the coverage rate of target gene is 100%,which is similar to previous algorithms.The proportion of target gene is 33.3%,which is higher than previous algorithms such as 15%-20 % and achieves good results.In the experiment,we use the gray image as a visual method of genetic data to visualize the results and assist decision-making.
Keywords/Search Tags:Gene data analysis, gSpan, Correlation coefficient, Mutual information
PDF Full Text Request
Related items