Bounds for the errors in word count distributional approximations

Posted on:2002-02-26

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Huang, Haiyan

Full Text:PDF

GTID:1465390011499289

Subject:Mathematics

Abstract/Summary:

The study of the occurrences of words in sequences are of interest in many fields, and in biological sequence analysis in particular. Given a sequence S and a collection of d words, O, it is of interest in many applications to obtain information on the multivariate distribution of the vector of counts U = (N(S, w1) ,..., N(S , wd)), where N (S, w) is the number of times a word w ∈ O appears in the sequence S. I obtain explicit bounds on the error made when approximating the distribution of U by the multivariate normal, when the underlying sequence is i.i.d or first-order stationary Markov over a finite alphabet. One application of our results involves the distribution of joint occurrences of restriction enzyme sites of a DNA sequence. I also prove that in order for U to have a non-degenerate covariance matrix, it is necessary and sufficient that the counted word set O is not full, that is, that O is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, I use a version of Stein's method.

Keywords/Search Tags:

Bounds, Word, Sequence, Distribution

Related items

1	English Word Ambiguities And Upper And Lower Bounds Of WSD In English-Chinese Machine Translation
2	Quantitative Studies Of Chinese Word Length
3	A Corpus-based Study On The Change Of The English Word Classes
4	On The Principles Of Spatial And Temporal Sequence Demonstrated By Chinese And English Word Orders
5	The Distribution Of Parts Of Speech In Syntactic Structure From Prototype
6	Errors Analysis Of South Korean Students Studying Chinese Language In The Sequence Learning
7	The Study On The Sequence Priority Of"Ruguo&Name" Conditional Sentence
8	Comparison Of Syllabic Word Length And Frequency Distribution Between Mee And CET4
9	From hoodoo women to robber queens: Breaking the bounds of ethnography and female subjectivity in Zora Neale Hurston's circum-Caribbean Marvelous Real
10	The Philosophical Problems Of The Bounds Of Cognition