| The regulation of gene expression is the key of understanding the biology genetic mechanism and solves the mystery of biology. Transcription is a crucial step of gene expression. Identifying and commenting the transcription factor binding sites plays a key role in researching transcription regulation and constructing expression regulation network. Along with the human's deeply research on biology and the development of computer technology, computational discovery algorithm has become the power auxiliary tool of the traditional experimental annotation method. Accurate identification algorithms can help people to identify target genes of different transcription factor binding sites, which provide accurate data for biological experiment and can promote experiment. At present, the existing algorithm can be generally classified into two categories, that is, algorithms based on consensus sequence and based on position weight matrix. However, these algorithms tend to fall into local optimum, and it is hard to get global optimal solution.This paper proposes two transcription factor binding sites discovery algorithms. One is based on the improved traditional genetic algorithm, arming to receive the global optimal solution; the other combines the genetic algorithm and Gibbs sampling algorithm, and uses position weight matrix model. This algorithm is suitable for various biological data.(1) The first method is proposed for the sequences that contained several transcription factor binding sites. We define a new fitness function, adding the variable'appear number'into this function, so the sequences contain multiple transcription factor binding sites have higher score.(2) The second method combines the genetic algorithm and Gibbs sampling algorithm. This method uses position weight matrix model. Position weight matrix model has many advantages; such as simple calculation process, few parameters, can resist background noise. Combine the genetic algorithm and position weight matrix; we firstly generate a position weight matrix randomly by the initial sequences, then get a converged position weight matrix through genetic algorithm. At last, we can discover the transcription factor binding sites by this converged position weight matrix.Finally, verifies and analyzes methods presented in the paper by experiment. Compare and analyze the experimental results with the existing method and the information labeled in TRANSFAC and DBTSS, shows the correctness and affectivity of the proposed methods. |