Font Size: a A A

The Study Of Speech Codec Based On Perceptual Quality

Posted on:2011-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YangFull Text:PDF
GTID:1118360305992264Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Speech communication is an efficient way of sharing information. In reality, speech signal is always influenced by environment and network transport will further introduce interference. With the rapid development of mobile communication systems, a variety of standards including variable rate speech coding and stereo speech coding are proposed. Obtaining better sound effect relates to speech coding, speech enhancement, post-processing and other aspects. Speech quality is tested through human auditory system which is a complex realization of psychological, physiological and physical processes. How to obtain pure voice and improve intelligibility of the speech is the focus of the recent study. The related work and innovative ideas include following aspects:Previous studies have pointed out that the perceivable residual noise can be effectively alleviated by considering the masking effect in human auditory system. The residual noise will not be perceived if it is under the masking thresholds of human auditory system. The subspace enhancement method takes removing the noise subspace as a first step. The clean speech is recovered in the remaining signal subspace by optimally weighting the signal coefficients in this subspace. Since the masking properties are related to the critical frequency band that is derived from the characteristics of human cochlea, the incorporation of masking threshold into a subspace technique requires the transformation between the frequency and eigen domains. The eigen-decomposition of autocorrelation matrix can be used to calculate the eigenvalues and the corresponding eigenvector matrix. The masking threshold is obtained from the pre-estimated power spectral density. The eigen-filter gain is used for computing the linear estimation matrix. The enhanced speech signal is achieved by inverse transformation. Combing the subspace decomposition and human auditory masking characteristic can achieve satisfactory effect.The voiced speech contains pitch and harmonic components with obvious periodicity. Adaptive Multi Rate (AMR) obtains the optimal excitation for linear prediction coefficient in a closed-loop manner by minimizing a perceptually weighted error criterion. The auditory system has a limited ability to detect noise in frequency bands in which the speech signal has high energy near the formant peaks. Before open-loop pitch search, the input speech signal will be passed through the weighting filter. The coefficients are used to modify the frequency response of the filter. Since the speech signal can be generally separated as the voiced and unvoiced frames which are basically judged by energy. By setting the threshold, the comb filter could be used to enhance speech component and suppress noise.Speech codec should keep good quality in various conditions such as diverse channels, different speakers and background noises. When transmission environment is poor and the channel coding could not effectively control error occurrences, error concealment should be applied. Generally speaking, error concealment is based on extrapolation method or repetition method in which the speech coding parameters are extrapolated or repeated from the parameters of the surrounding good frames received. The adaptive codebook introduces a strong interframe dependency and renders the decoder vulnerable to frame erasures. To constraint the contribution of the adaptive codebook to model the pitch excitation could improve the speech quality over the standard codec in the case of frame erasures. When consecutive error frames are received, the value of pitch lag is adjusted to fluctuate instead of increasing all along. This could avoid excessive periodicity which may bring annoying sound. Secondly, when continuous bad frames end and the good frames are received, the codebook gains will cause the energy undulate. The coefficient is modified to smooth the effect.The characteristic of embedded technology is very suitable for the current trend of the terminal market. Considering the limited resource of mobile device, the operation complexity of the applications needs to be reduced. The implementation and optimazition of AMR on mobile device is presented. The algebraic codebook search is one of the most complex parts of AMR codec. Corresponding pulses are located in different tracks. The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesized speech. The nested search brings large amount of computation. A more efficient search algorithm is used to obtain the corresponding pulse position to save the coding time. The instruction optimization has also been considered. The experiment have demonstrated that the method reduced the complexity of computation greatly with little sacrificing of speech quality. The research is helpful to achieve high-efficiency when implementing multimedia application on the embedded devices.
Keywords/Search Tags:Speech Codec, Percetual Weighting, Subspace, Post Filtering, Embeded Platform
PDF Full Text Request
Related items