Font Size: a A A

Research On Identification Algorithms For Genomic Islands And Their Application

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2180330482480735Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Genomic islands are segments of a particular region, which closely related to horizontal gene transfer, having special structure and usually carrying pathogenic, drug resistance and adaptation function related genes. By moving among different species, genomic islands make related gene spread cross species, which make contribute to evolution of species and acquisition of new function. Genomic islands is a hot topic in microorganism research. This paper focuses on identify algorithm and studies information extraction, feature selection and prediction methods. The main work is represented as follows:1. We reviewed extraction methods in GIs prediction, include GC content, codeusage, k-mer,t RNA, ORF, Re Puter and so on. GIs identify algorithm like Centroid, Alien_Hunter and Sighunt were detailed introduced. Information extraction and identify method are two important aspects of GIs prediction, which are theoretical and practical foundation for this study.2. We proposed a prediction method for genomic islands using two sample t-test. First, we select regions from the host with the help of confidence intervals on the windows’ variances.Then we use kurtosis to select core signatures. At last, we score each window using two-sample t-test. Using significant test, we constructed a prediction model for genomic islands. By AUC,we further discussed the influence of different information extraction, feature selection and score method on prediction model. The experimental results show that our model that using kurtosis to select signature and using two-sample t-test to score achieved the best performance, and its AUC at least 5% higher than the existing method.3. We proposed a prediction method for genomic islands using multi-scale statistical testing model. First, we utilized small-scale t-testing with large-scale feature selection to quantify the compositional differences from the host genome. Then, we employed large large-scale statistical testing to identify some multi-window segments using dynamic signals from small-scale feature selection. At last, we refined the boundaries of the selected genomic islands in terms of the GC content and Markovian Jensen-Shannon divergence(CG-MJSD). We evaluated the proposed method on four real biological data-sets and compared it with the available competing prediction methods. The results indicate that the proposed method MTGIpick achieved the highest precisionand recall among the evaluated methods, and the length of genomic islands from our method is nearest to the real one.4. We proposed the graphical user interface(GUI) design by java. Here, we introduced basic knowledge of developing GUI and mainly described the design process of MTGIpick software.Users can choose algorithm and corresponding parameters to test predict genomic islands according to their needs.
Keywords/Search Tags:Genomic islands, prediction and identification, k-mer, t-test, multi-scale testing
PDF Full Text Request
Related items