| Advances in high-throughput sequencing technologies have brought us into individual genome era. Massive genome sequencing data enables individual genome and multiple genomes studies. Due to complex knowledge and massive data in genome research area, annotation and visualization can help investigators to understand and interpret big data more efficiently, thus are essential component in genome research. Currently, genome annotation and visualization methods are designed towards annotating and displaying general human genome knowledge. These methods are difficult to be used in indivdiual-central genetic studies and clinical applications, developing methods for individual genome annotation and visualization remains a critical challenge.Based on current general genome annotation and visualization methods, this thesis developed individual genome annotation model, individual genome visualization method and multiple genomes visualization method to meet the new requirements of individual genome studies. The methods can greatly facilatate the individual and multiple genome sequencing data application in medical research, personal healthcare, and disease prediction/diagnosis/treatment. The main content includes:(1) A model for individual genome annotation was proposed.Individual genome studies mainly aim to investigate specificity. Thus individual genome annotation model should focus on individual specificity for further mining individual-related knowledge and eliminating unrelated information from massive general genome knowledge and data. Current genome variants annotation models consider variants’ functions isolately, and are not capable for annotating individual genome variants.This thesis proposed an annotation model for annotating individual genome specificity. The model analyzes individual genome variants, infers deleterious genetic variants and their potential functions in molecular traits, and then extracts diseases and drugs that related to individual variants and molecular traits. Particularly, the model firstly considers functional interactions between variants, which makes the inference more accurate.(2) An individual genome visualization method was designed.Individual genome visualization method should highlight individual genome specificity, i.e., variants, and their functions in individual inheritance, development, metabolism and other life activities. Visualization supports investigators and physicians browsing and analyzing individual genome data efficiently. Individual genome data is generated increasingly, however, there is no visualization method dedicated to visualize individual genome specificity.This thesis described an individual genome visualization framework based on individual genome annotation model. The visual elements of individual variants, molecular traits and phenotypes were designed. An individual genome visualization system was developed. The system provides comprehensive and intuitive visualization and analysis for individual genomes.(3) A multiple genome visualization method was designed.Multiple genome research focuses on modelling multiple individual relationships and comparing individual differences. Visualization of multiple genomes can help researchers to analyze and compare differenc es among multiple individual genomes intuitively. However, multiple genomes are difficult to be visualizaed in limited displaying space owing to the huge number of variants. And most of genome variants are less informative.This thesis analyzed the data dimension reducing strategy for multiple genome visualization, proposed multiple genome similarity solving algorithm based on LDA model and KL-divergence, designed multiple genome visualization dimention reducing method. The phase3 datasets of 1000 genomes project were used to verify the effectiveness and reliability of the above methods.(4) A family genome visualization method was designed.Individual genome studies mainly aim to prediction, diagnosis and therapy of genetic disorders. In genetic studies, family genome data is offen used for researchers investigating disease inheritance patterns. Family genome s relate to each other, but differences also exist, genotypes and phenotypes should be considered simultanously. Traditional genome visualization framework cannot well handled family genome analysis.This thesis designed reference-individual-central family genome visualization method, proposed three comparing modes for family genomes, developed family genome visualization system. The system displays relationships between family members intuitively, and highlights the influence of family relationships in variant level. The family genome browser enables researchers discover similarity and differences among massive family genome variants based on pedigree information. |