| DNA methylation,as an import epigenetic modification,plays an crucial role in embryonic development,chromosome structure,X-chromosome inactivation,genomic imprinting and chromosome stability,cell differentiation,disease and cancer formation and other diseases.The differences between different biological samples under the condition of methyl area,may be involved in the regulation of gene expression,and then affect gene function.There is a significant difference between the differentially methylated region recognition and the general feature selection: the feature selection is usually assumed to be irrelevant,while the CpG sites are characterized by the position correlation in the genomic space.Previous studies have indicated that the identification method of the whole region is more valuable than the single site identification method.Identifying differentially methylated region is an important and novel biological field research questions.However,the existing difference in methylation region identification method has some problems,for instance,excessive delete significantly weaker methylation site,length is limited and cannot be directly processed in multiple categories.To solve these problems,we propose three differentially methylated region recognition algorithms,the main research results are as follows:Firstly,in order to deal with multi-class problem in directly,a new algorithm for identifying differentially methylated region using sliding window and KNN classifier is proposed.Using sliding window and KNN classifier,the candidate regions of the genomic location association were selected,obtain differentially methylated regions through merging the candidate regions that meet the classification error rate conditions.Experiments on real data show that the classification performance of the algorithm,the clustering index was significantly better than the control algorithm,which extends the length of differentially methylated regions identified by the control algorithm,and identify some differentially methylated regions that can’t be found in control algorithm.Secondly,the first method for the existence of two candidate differentially methylated regions will lost the the adjacent sites when does not meet the splicingconditions of region,this paper proposes a strategy based on greedy differentially methylated region recognition algorithms.The first method is to use sliding window and KNN classifier constructed screening candidate region model,then using greedy strategy extended the length of the candidate regions obtained differentially methylated regions.Optimal performance through experimental analysis comparing the effectiveness and accuracy of the algorithms,this method is better.At last,according to the published and the first two algorithms exist some problems,such as dependency classifier and the need to the need to set in advance of experimental parameters,this paper presents an identify differentially methylated region algorithm based on clustering validation technique.The method is using clustering validation technique to construct an identify differentially methylated region model,then the greedy heuristic algorithm to optimize from the genome search subsets of differentially methylated sites,makes the different classes in the subset of the dimensions of space with good separability of the problem,and by solving the subsets of differentially methylated sites to get differentially methylated regions.The experimental results show that this method is optimal,and the method has no parameters and is easy to use. |