Font Size: a A A

Gene Identification Via Phenotype Sequencing

Posted on:2016-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhuFull Text:PDF
GTID:2180330470971799Subject:Biological information
Abstract/Summary:PDF Full Text Request
In recent years, next generation sequencing has a rapid development with increasing throughout and lower cost. The investigations about identifying functional gene related to Mendelian phenotype through sequencing are called phenotype sequencing. There are several procedures along with whole phenotype sequencing protocol, including sample sequencing, read alignment, variant calling, variant filtering and candidate gene calling. The success of a phenotype-sequencing research depends on many factors, such as low quality and low depth of sequencing, strict parameters at variant calling and filtering. It is still difficult for an investigator to measure the impact of these factors, to optimize the phenotype sequencing protocol and to obtain the functional gene related to phenotype at high probability.In this study, considering the whole phenotype sequencing analysis protocol and real sequencing data, a probabilistic framework was established based on variant calling sensitivity and variant calling specificity. Four measurements of study effectiveness were computed to help iterative optimization of a study protocol. The four measurements are:a) chance of reporting true phenotype-associated genes; b) number of random genes that are expected to meet reporting criterion; c) significance of each reported gene to associate with the phenotype; and d) significance of violating the Mendelian assumption, if no gene passes reporting criterion or all reported genes are confirmed false positives in follow up validations. A java software package named GIPS (Gene Identification via Phenotype Sequencing) was developed for computing. User manual and source code are available in gips google project page. Users can optimize their analysis protocol and maximize the chance of identifying functional gene.GIPS was run to deal with real data and simulated data. The impacts of parameter choice on variant calling sensitivity were discussed firstly, such as sequencing depth, base quality. In addition, by using GIPS, the Kabuki syndrome related gene MLL2 was identified and ranked as second among all candidate genes. A phosphate transporter gene was identified as the only candidate gene that related to the lack of phosphorus in rice. Through calculations, it was discovered that MRKH syndrome mainly be caused by multiple genes’mutation or non-coding region mutation. Website:https://code.google.com/p/gips/...
Keywords/Search Tags:Sequencing, Variant calling, Phenotype sequencing, Sensitivity, Specificity
PDF Full Text Request
Related items