Font Size: a A A

Research On Weighted Sequence Similarity Algorithm Based On K-MER Position Information

Posted on:2023-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q WuFull Text:PDF
GTID:1520307103976599Subject:Statistics
Abstract/Summary:PDF Full Text Request
Phylogenetic analysis can reveal the evolutionary process of species community.Traditional phylogenetic analysis mainly use sequence alignment,but it has many shortcomings:the accuracy of sequence alignment decreases rapidly when the sequence consistency is lower than a certain critical point;the homologous sequence alignment algorithm assumes that homologous sequences are composed of linear sequences,but the amount and sequence of genetic information in different living environments vary greatly;alignment-based approaches typically consume more memory and time;the computation of accurate multi-sequence alignment is a NP-hard problem.However,the computational cost of alignment-free algorithm is relatively low,which has aroused great interest and concern of researchers.The traditional alignment-free method mainly uses the frequency information of k-mer(nucleobase sequence of length K),and does not make full use of the position information of k-mer in the sequence.In this thesis,three alignment-free methods based on k-mer position are proposed.We also tested our method with multiple data sets.The main contents of this thesis are as follows:1.Combining k-mer position information with k-mer frequency information in the sequence set.We proposed an alignment-free method based on whole genome sequence(called IEPWRMKmer).Firstly,the position information of k-mers in the sequence is calculated,the relative position of k-mers in the sequence is calculated to obtain the position weighted measure of k-mers Then,the frequency of k-mers in the whole sequence set is calculated,and the information entropy weighting measure of k-mer frequency information can be obtained.Combining the position weighting measure of k-mers with the frequency entropy weighting measure of k-mers,the feature vector of k-mers can be obtained.The Manhattan distance formula is used to calculate the distance between pairs of sequences.Last,the neighbor-joining(NJ)method is used to construct the phylogenetic tree.The IEPWRMKmer method is verified by two data sets,and the results show that our method is reliable and trustworthy.2.More sequence phylogenetic information can be obtained by combining reverse sequences.On the basis of IEPWRMKmer method,we reverse the sequence and extract more sequence phylogenetic information by using the same feature extraction method.We call this method as Rev-seq-IEPWRMKmer.First,we obtain the feature vector for a sequence by IEPWRMKmer method.Then,we reverse the sequence and still calculate feature vector by the above method.The two feature vectors can be combined to form a new feature vector.We can obtain N*4~K new feature vectors for N sequence.Finally,these new feature vectors are used to construct phylogenetic tree.We tested our new approach using three datasets,two of which were used in the IEPWRMKmer work,and another dataset containing 82 whole genome sequence information of HCV viruses.The results show that our method is effective.3.The k-mer position information is converted into point coordinates by chaos game representation(CGR)of sequences to obtain new feature vectors.Through CGR of sequences method,the position information of k-mers is converted into the coordinate information of points in the image,and then the new feature vector is obtained by calculating the relative distance of points in the CGR image and combining the information entropy weighting measure of the frequency information of k-mers.This method is called the IE-CGRPKmer.We tested our new approach using three data sets,including 48 HEV viruses,82 HCV viruses,and 152HBV viruses,respectively.The results show that our method is effective and optimal.
Keywords/Search Tags:Phylogenetic tree, Alignment-free approach, k-mer position, Shannon Entropy, Reverse sequence, CGR
PDF Full Text Request
Related items