Font Size: a A A

Researches On The Compression And Reconstruction Methods Of The Mixed Audio Signals

Posted on:2016-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X JiangFull Text:PDF
GTID:1108330503493693Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Audio signals typically comprise speech signals, music signals, and the mixing of two in any proportion (i.e., the mixed audio signal). Considering the fact that current audio compression methods are proposed either for the speech signals or for the musical signals, this paper launched an in-depth study on the compression and reconstruction methods that can be adapted to all types of audio signals (especially mixed audio sig-nal). Due to the compression and reconstruction process of the mixed audio signals can usually be divided into there steps:sparse representation, compression and recon-struction, and the quantization and coding, our research work is focused on these three areas. Among them, step two can usually be realized by two ways, one is the unified speech and audio coding (USAC) scheme and the other is the compressed sensing (CS), which are based on the traditional Shannon sampling and the CS theories, respectively. Among them, the USAC was developed by the Moving Picture Experts Group (MPEG) in 2012, which could be used for all types of audio signals, simultaneously, especially the mixed audio signals, and its performance is no less than the current best speech and musical coding standards; the CS method is based on the CS theory, in which the compression and the sampling processes are merged, as a result, the sampling of the mixed audio signals is extremely simple, and in a sense breakthrough limitations of the Shannon sampling theorem.It should be noted that, with the development of technology, the unified com-pression and sampling methods that can be adapted to all types of audio signals will quickly occupy an important place in our lives. The methods based on the traditional Shannon sampling theorem usually contain a mature structure, a good compatibility, etc., and will be dominated in current and near future period. However, due to its high complexity, weak universality and other shortcomings, is bound to be replaced by the CS methods as which become more and more sophisticated. In this thesis, we first carried our researches on the sparse representation, quantization and coding, and the USAC, and then, explored the mixed audio signal CS compression and reconstruction methods. The main works of this paper are as follows:· The transform domain coefficients of the mixed audio signals are usually quan-tized by a vector quantizer. To resolve the contradiction that with the increase of the vector dimension, the storage requirements of the quantizer will be increased exponentially, a finite state entropy-constrained vector quantizer (FS-ECVQ) is proposed in our work. The FS-ECVQ estimated the statistical properties of cur-rent vector based the previous adjacent ones contained in the current frame and the previous frames, thus effectively eliminating the redundancy of the intra-and inter-domain transform coefficients, and thus significantly improve the quanti-zation performance. Experiments show that compared with the corresponding algorithm within the USAC final version, FS-ECVQ in maintaining the rate to distortion (R/D) performance unchanged, while reducing storage requirements by 14.6%.· To resolve the problem that the typical band wide expansion (BWE) methods, Spectral Band Reputation (SBR) and Harmonic Bandwidth Extension (HBE), are both inefficient for the mixed audio signals, we proposed an adaptive BWE (aBWE) method. In the proposed method, the SBR and the HBE are combined, and the selection of the best method of the current frame was completed accord-ing to its spectral characteristics. As a result, this method can also apply to a variety of audio signals, especially mixed audio signal. Experiments show that, for the speech and musical signals, the aBWE is not inferior to the SBR and HBE, and for mixing audio signals, performance of the aBWE are significantly better than the performance of SBR and HBE.· Sparse representation of the signal is the premise and foundation of the CS the-ory, but it is difficult to decompose the mix audio signals in a single orthogonal basis. In response to this difficulty, we propose a method which utilized a struc- tured Least Absolute Shrinkage and Selection Operator (LASSO) for the sparse approximation of the mixed audio signals. In this method, the audio signals are seen as the addition of the tonal, transient and noise components and it can be decomposed into tone layer and the transient layer by means of the structured LASSO. As the resulting two layers were sparse in the time and frequency do-main, respectively, it achieved a near optimal sparse decomposition of the audio signals, thus providing a prerequisite and foundation for the CS theory in the audio signal.· As in general audio CS methods, the audio signals are usually divided into frames, and each frame is compressed and reconstructed independently, ignoring the time correlations between the continue frames, we proposed a mixed audio signal re-construction algorithm based on dynamic CS. This algorithm is based on the statistical modeling of the mixed audio signal. By means of the information transmission mechanism, it makes full use of the temporal correlation between continue frames; as a result, it improves the performance of the reconstruction system.In our paper, we first focused on certain key issues of the traditional audio com-pression methods and then applied the CS theory to the audio sampling. Our research-es will improve the traditional methods; moreover, they can also be useful for the CS methods. As a result, our work is not only of theoretical value but also of practical value.
Keywords/Search Tags:Compressed sensing, LASSO, sparse approxima- tion, VQ, audio compression
PDF Full Text Request
Related items