Font Size: a A A

Study On Pattern Recognition Of Eucalyptus Gene Sequencing Data In Single Nucleotide Polymorphisms

Posted on:2017-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:W S LinFull Text:PDF
GTID:2323330509961667Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technology, the new sequencing technology has been developed and widely used, but the traditional PCR sequencing is still very important. Single nucleotides polymorphism and insertion / deletion represent the new generation of DNA molecular marker technology, thus an efficient algorithm is required to verify the analysis. Since the sequencing of the software provided by the manufacturer can only identify the highest peak of each sequence position corresponding to the base, the recognition for the location of low peak which between the twin peaks still requires a third party software. While the third party software needs reference sequence, which cannot be used for some sequence analysis and the operation is more complicated. Therefore, this research uses pattern recognition method to construct the automatic detection system of SNP and In Del and the main results are as follows:1. Sequence signal is extracted by traditional sequencing file format, using the Haar wavelet, Symlets wavelets, coiflets wavelet and Reverse Boir wavelet to remove impurity peak signal and compared the filter results of four wavelet functions, which provides high quality sequence of bi-modal base’s reliable interpretation. Then the four base data which have been denoised are synthesized to complete Eucalyptus gene data, and extractions of the peaks’ distance, height and fluctuation ratio as the characteristic parameters of SNP site are detected. Then the Fuzzy reasoning device is used to generate the test data applying for SNP classifier training sites.2. The detection algorithm of SNP and In Del has been researched. According to the training data, BP neural network classifier based on LM algorithm, support vector machine classifier, sparse classifier was used for detecting SNP loci respectively, and analysis are made to compare the three kinds of pattern recognition algorithms. Prime Indel algorithm and the dislocation of the corresponding mathematical algorithm are used for In Del detection and analysis.3. Both Lab Windows CVI9.0 and MATLAB2012, as Eucalyptus sequencing data system development platform, are adopted. According to the Lab Windows and matlab mixed programming principle, diploid individuals in SNP and In Del polymorphism detection system are build based on pattern recognition. The system integrates data display, manual adjustment and data storage together.4. In this paper, Di SNPIndel was used to verify the accuracy of SNP loci and In Del fragment detection, and compared with the existing software. The experimental results show that the SNP recognition rate of Di SNPIndel was 88.5%, higher than 1.5% of novo SNP and 17% of Sruveyor Mutation. And the In Del fragments recognition rate of Di SNPIndel was 53.1%, higher than that of Prime Indel(6.1%), novo SNP(7.4%) and Sruveyor Mutation(6.8%). It is proved that the accuracy of Di SNPIndel is better than other software in detecting SNP loci and In Del fragments of diploid individuals without reference sequences.
Keywords/Search Tags:single nucleotide polymorphisms(SNP), In Del, Data Processing, Pattern Recognition, system construction
PDF Full Text Request
Related items