| Motif discovery is a central challenge of bioinformatics. It gets the characteristicmotif hided in the sequence by finding the similar segment in the different sequences toreveal the biological meaning hided in the sequence. Recent biological experimentssuggested that there exists dependency among positions in some motifs significantly.But numbers of motif discovery algorithm todaydidn’t consider the dependency.In this paper, the background, the purpose and the implication of the topics arepresented briefly. Then, we describe some typical motif discovery algorithm. A Gibbssampling based algorithm, which considers position dependencies between multinomialdistributions, is presented in this paper. The implementation of this approach is namedSimiMotif. SimiMotif uses χ2test or Fisher’s exact test to determine the dependencebetween positions of motifs. Every motif is described as a position weight matrices anda one-dimensional arry named Simi_link which is used to represent the dependency, anduses a new score function to discover motif. Output motifs of most significantdifference between the background. And extend the dependency to multi-dimensional,the implementation of the mult-dimensional dependency is named MultMotif. Finally,SimiMotif and MultMotif is tested on the benchmarks of Tompa et al. in2005andSandve et al. in2007. The test results are compared with numbers of exisiting methodsand compared with the method that silimaler to SimiMotif but don’t consideringposition dependence.The results show that SimiMotif and MultMotif can discribe the motif better and tosome extent has an improve discovery accuracy. The main parameter of SimiMotif andMultMotif ranked sixth and fifth in the seveenteen motheds which are tested on thebenchmark of Tompa et al.. How to improve the accuracy and efficiency of thealgorithm will be a new work. |