The study of the occurrences of words in sequences are of interest in many fields, and in biological sequence analysis in particular. Given a sequence S and a collection of d words, O, it is of interest in many applications to obtain information on the multivariate distribution of the vector of counts U = (N(S, w1) ,..., N(S , wd)), where N (S, w) is the number of times a word w ∈ O appears in the sequence S. I obtain explicit bounds on the error made when approximating the distribution of U by the multivariate normal, when the underlying sequence is i.i.d or first-order stationary Markov over a finite alphabet. One application of our results involves the distribution of joint occurrences of restriction enzyme sites of a DNA sequence. I also prove that in order for U to have a non-degenerate covariance matrix, it is necessary and sufficient that the counted word set O is not full, that is, that O is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, I use a version of Stein's method. |