Visual Analysis Of Gene Data

Posted on:2019-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:J W Liu

Full Text:PDF

GTID:2370330593951088

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As the focus of human genome research has shifted to functional genomics,the focus of bioinformatics research has quietly shifted from the accumulation of biological data to the processing and extraction of biological data.Traditional biological information technology often uses artificial intelligence,machine learning,data mining and other methods to collect,process and utilize biological data.With the deepening of functional and gene expression of genes and proteins,the amount of gene expression data increases exponentially,and how to analyze the massive and high-dimensional genetic data and effectively apply biological information from the global to the local Interpretation,understanding,assessment and reasoning have become very important and urgent issues in the field of biological information.In this paper,we develop a target gene mining algorithm based on gSpan.The algorithm aims to mine the target gene from massive gene data.Gene data is usually divided into experimental group and control group.Based on the fact that the target gene should have small differences in same group and large differences in different groups,we use variance to select gene.After that,we calculate the correlation coefficients of two samples in same group.For those two samples that have small correlation coefficients,we find the gene fragments that have small difference.We also calculate the correlation coefficients between two samples of different groups,and for those two samples that have large correlation,we find the gene fragments that have large difference.Finally,we put all these gene fragments found in last step into one set.For this set of gene data,we use the mutual information to calculate the correlation between gene fragments.Thus we organize every sample gene data into a graph,in which gene fragments are nodes and their correlations are edges.Then we apply the gSpan(Graph-based substructure pattern mining)algorithm to the graph for subgraph mining and matching.The resulting subgraph is the experimental result.Existing gene visualization analysis mainly uses visualization as a demonstration of the final result or has developed a visualization framework.There is little use of visual tools to assist decision making.In this paper,not only the grayscale is used to visualize the genetic data to visualize the results of the algorithm,but also the visual methods such as line chart and grayscale are used to assist the analysis and algorithm decision-making.Our experimental data set is the genetic data collected for human dental caries.After verification,the results show that the coverage rate of target gene is 100%,which is similar to previous algorithms.The proportion of target gene is 33.3%,which is higher than previous algorithms such as 15%-20 % and achieves good results.In the experiment,we use the gray image as a visual method of genetic data to visualize the results and assist decision-making.

Keywords/Search Tags:

Gene data analysis, gSpan, Correlation coefficient, Mutual information

PDF Full Text Request

Related items

1	Reconstruction Of Gene Regulatory Networks From Gene Expression Data
2	Analysis System For Gene Co-expression Correlation In Human Immune Cells Based On Linear And Nonlinear Correlation
3	Several Mutual Information-based Models For Measuring Co-evolution Between Protein Residues
4	Study On Algorithms For Reconstruction Of Gene Regulatory Networks
5	Research And Application Of Correlation Algorithm For Massive Data
6	Research Of Gene Network Construction Method Based On Mixed Entropy Optimizing Mutual Information
7	Research On Variable Selection Algorithm Based On High-dimensional Complex Data
8	Several New Proofs Of The Nature Of Mutual Exclusion
9	Reconstruction Of Gene Regulatory Networks Based On Mutual Information
10	Statistical Analysis Of Several DNA Sequences In Human Genome