Nonstationary time series modeling with applications to speech signal processing

Posted on:2011-06-19

Degree:Ph.D

Type:Thesis

University:Harvard University

Candidate:Rudoy, Daniel

Full Text:PDF

GTID:2448390002467464

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

We develop statistical methods for the analysis of nonstationary time series and apply them to a variety of problems arising in speech signal processing. Information-carrying natural sound signals such as speech exhibit a degree of controlled nonstationarity in that their statistical properties vary slowly over time. Faithfully modeling these temporal variations is extremely valuable for a wide range of applications and can be accomplished by relying on well-understood acoustic models of speech production, which motivate many of the methods developed in this thesis.;First, we make a number of contributions to the classical problem of formant tracking, in which vocal tract resonances are estimated under the assumption of their invariance on the 15-30 ms scale. Next, we relax this piecewise-stationarity constraint and model the temporal dynamics of the vocal tract using time-varying autoregressive (TVAR) models. We develop their algebraic and geometric properties, introduce several new estimators, and use TVAR models to develop a hypothesis test to detect the presence of vocal tract variation in speech waveform data. We study its asymptotic properties, and illustrate its practical efficacy by detecting vocal tract changes across different timescales of speech dynamics.;Next, we explore how standard fixed-resolution short-time Fourier representations may be generalized in order to adapt to the time-frequency structure of a speech signal. To this end, we introduce a family of adaptive, linear time-frequency representations termed superposition frames and show that they are invertible, numerically-stable, and admit fast overlap-add reconstruction akin to standard short-time Fourier techniques. The general construction proceeds via a local signal-adaptive modification of a Gabor frame. Two signal-dependent schemes for selecting an appropriate superposition frame for signal analysis are given, and the framework is illustrated in the context of speech enhancement.;Finally, we introduce a joint model of the vocal tract and the source waveform in order to take into account its quasi-periodic temporal variations during voicing. We incorporate an estimate of the source waveform into the traditional linear prediction framework via nonparametric wavelet regression; the resultant semi-parametric model is applied to various speech analysis problems including formant and source-harmonics-to-noise ratio estimation, inverse filtering, and voicing detection.

Keywords/Search Tags:

Speech, Time, Model, Vocal tract

PDF Full Text Request

Related items

1	On Vocal Tract Characteristics Of Chinese Whispered Speech And Its Applications In Perceptual Study
2	Modeling Of3D Geometry Vocal Tract In The Procession Of Speech Production
3	Nonstationary time series modeling with applications to speech signal processing
4	Research On The Vocal Tract Model Based On Machine Learning Methods Of Speech Inversion
5	The Study Of Vocal Tract Model And Its Control Mechanism Based On The Spech Production And Acquisition Of DIVA Model
6	The Research On Vocal Tract Spectrum And Transition Methods In Voice Conversion
7	Articulatory speech synthesis and speech production modelling
8	The Research Of Voice Conversion Based On The Spectral Parameters Of Vocal Tract
9	Numerical Research On The Acoustic Features Of The Vocal Tract In Horseshoe Bat
10	An auditory feedback-based model of speech production in the developing child