Font Size: a A A

Research On Technologies Of Low Bit Rate Speech Coding Based On The Multi-Band Excitation Model

Posted on:2012-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1488303362951189Subject:Military communications science
Abstract/Summary:PDF Full Text Request
With the continuous development of technologies and the growing needs in wireless communications, the wireless communication is widely used, and becoming more and more popular. Among lots of wireless communication services, the speech service is the most fundamental and common one. In order to reduce transmission costs, improve the transmission efficiency, and achieve the transmission of speech signals with a good quality in the low rate or storage condition, analog speech signals are often digitalized and transmitted in the form of bit streams. This process is defined as the speech coding.The parametric coding is one kind of speech coding algorithms among the numerous speech coding algorithms. It's able to realize the low coding rate. So the parametric coding is widely used in secure communication, satellite communication, voice mail et al.. The parameters of a speech signal are the fundamental period, the spectral envelope and unvoiced/voiced decisions. The extraction and the quantization of parameters, the precision of which has a direct effect on the quality of the synthesized speech, have been extensively studied in recent years. Based on existing research achievements, this dissertation focuses on the Multi-Band Excitation (MBE) coding, the parameter extraction and the parameter quantization et al., and proposes many improved algorithms. The main contributions are as follows:1. An MBE algorithm based on the adaptive forward-backward quantization of linear predictve coefficients (LPCs) is proposed. This method consists of two modules, the MBE algorithm and the adaptive forward-backward quantization. The MBE algorithm estimates the speech parameters; the adaptive forward-backward quantization makes use of the similarity of statistic characteristics which belong to adjacent frames, to realize the variable-rate coding of LPCs and reduce the encoding bit-rate. An improved method of the MBE algorithm based on the adaptive forward-backward quantization of LPCs is proposed. The improved algorithm additionally sets up an adaptive backward codebook of linear spectral frequency (LSF) coeffitients, and linearly interpolates between LSF coefficients of adjacent frames in the adaptive backward codebook of LSF coefficients, to promote the accuracy of the adaptive backward codebook of LPCs. Objective tests show that compared with the MBE algorithm based on the adaptive forward-backward quantization of LPCs, the improved method increases the average use frequency of the backward quantization, and further reduces the average coding rate.2. To eliminate the effects of the linear distortion on the real-time true envelope estimator (TE) algorithm, an improved real-time TE algorithm with correction factors for removing the linear distortion is proposed. First, the accumulating TE algorithm and its real-time algorithm are deduced. Second, by the accumulating real-time TE algorithm, it is revealed that the linear distortion of the sub-band inverse fast Fourier transform can reduce the precision of envelopes extracted by the real-time TE algorithm, and slow down the iteration. Third, the error formula of the linear distortion of the sub-band inverse fast Fourier transform and the calculating formula of the correction factors are deduced. An improved real-time TE-linear predictive coding (TELPC) algorithm based on Mel-warping is introduced. Human's ears are more sensitive to the low frequency part of speech signals. Based on this auditory feature and the improved real-time TE algorithm, the improved real-time TELPC algorithm based on Mel- warping makes the linear predictive analysis of the spectral envelop on the Mel scale, and reduces the dynamic rang of the warpped spectral envelope by the compression. Objective tests show that compared with the improved real-time TELPC algorithm, the new algorithm can noticeably reduce the spectral distortion in the low frequency band.3. The trajectory compression of LSF coefficients by minimizing the time weighted mean square error is proposed for the low bit rate speech coding. Human's ears are more sensitive to the auditorily important frames which are voiced or have the high energy. Based on this property and the trajectory compression, this new method weights the mean square error with different factors, to improve the fitting of the function to the LSF coefficients of the auditorily important frames, and the speech quality. Taking the MBE-LPC as the testing system, compared with the trajectory compression, the time weighted trajectory compression of LSF coefficients arises the mean opinion score (MOS) of the reconstructed speech by 0.022.4. The adaptive filtering method in the empirical mode decomposition (EMD) domain is proposed for the edge detection. This algorithm is composed of the EMD method and the spatially adaptive filtering. The EMD method is used for obtaining the first derivatives of residuals of the original signal in scales. By the spatial correlation function, the spatially adaptive filtering enhances the peaks in the first derivative of a residual which correspond to the edges in the original signal, and suppresses the noise. Two methods of estimating the threshold of the noise power are proposed: one is based on the consistent EMD method; the other one is based on the median absolute deviation. Simulations indicate that the new adaptive filtering method can accurately detect edges of a signal, and be used for the pitch determination. A pitch determination algorithm based on multi-scale product in the ensemble EMD domain is proposed. This method multiplies the first derivatives of residuals in adjacent scales in the ensemble EMD domain to enhance the peaks at the glottal closure instants, and calculates the correlation function of the multi-scale production to estimate the pitch. Objective tests show that the new pitch determination method can present a good performance either in a clean environment or in a noisy environment.
Keywords/Search Tags:Speech coding, Parametric coding, Adaptive forwardbackward quantization, Spectral envelope extraction, Mean square error criterion, Empirical mode decomposition, Edge detection, Pitch determination algorithm
PDF Full Text Request
Related items