| The object-based 3D audio system has attracted much attention because of its realistic sound effect and support for personalized interaction.However,this system’s audio data size will increase linearly with the number of audio objects.Thus,how to compress multiple audio objects is a precious research topic.Audio object coding technology is an effective method to solve the above problem.It compresses audio objects into downmix signals with few side information,significantly reducing the amount of object-based audio data.The classical audio object coding method extracts side information in each subband and uses the side information to recover each independent object from the downmix signal during decoding.Since frequency bins in the same subband share a common parameter,this method can recover the total energy in the frequency domain with only a small amount of data.The problem is that the energy of each frequency bin cannot be accurately restored,and the decoded object signal will contain the frequency components of other objects(i.e.,aliasing distortion),which will affect the listening experience.Some methods to reduce aliasing distortion have been proposed in recent years,such as subband subdivision,autoencoder transform,residual compensation,and decorrelation.However,the existing research still has the following problems: under the condition of limited bit rate,the object signal decoded by the current methods still has aliasing distortion that can be perceived by the human ear,which affects the subsequent interactive rendering experience;when the network fluctuates,the bit rate of side information needs to be adjusted adaptively.The existing methods can not compress the side information according to the perceptual characteristics of aliasing distortion,and the decoding sound quality degrades seriously after the bit rate adjustment.In view of the above problems,this dissertation carries out research work from the following three aspects to achieve audio object coding without perceptual aliasing distortion:(1)Research on time-frequency shifting for audio object codingThe aliasing distortion of the decoded object signal can still be perceived by the human ear.To solve this problem,this dissertation proposes moving the frequency energy from the aliasing region to the non-aliasing area to ensure that only one object is active in the same frame and subband.To record the shifting information under the condition of limited bit rate,several continuous frequency bins were combined to do time-frequency shifting according to the distribution of the perceivable components.Compared with the high-frequency resolution method,the proposed method can improve the signal distortion ratio(SDR)by 259%,improve the signal interference ratio(SIR)by 254%,and improve subjective listening quality by 18 points under the bit rate is reduced by 19.1%.Based on listeners’ reports,the proposed method achieves the quality of non-perceptual aliasing distortion.(2)Research on residual compensating for audio object codingWith the increasing number of audio objects,the moving space of timefrequency shifting is insufficient.Residual compensation can effectively suppress the aliasing distortion of the remaining unmovable objects,but the current method can only compensate one target object.Thus,this dissertation proposes a step-by-step downmix method to extract the residual of all objects and uses matrix decomposition to avoid the bit rate surge caused by multi-object residuals.At the same time,this dissertation found that the downmix order will affect the subsequent decoding results.Based on the understanding of aliasing distortion,the optimal order in cyclic downmixing is determined to realize the adequate compensation of aliasing distortion.The bit rate of the proposed method is 49% lower than the classical two-step residual compensation method,SDR is improved by 96%,and SIR is improved by 83%,and the overall average score is 20 points higher than that of the two-step residual compensation method.(3)Side information and bit rate control under non-perceptual aliasingThe existing audio object coding methods cannot preferentially transmit important perceptual information,and the sound quality degrades seriously after adjusting the bit rate in the face of network fluctuation.Therefore,this dissertation uses the ideas of perceptual encoding and researches the side information compression and bit rate control under auditory perceptual features.This dissertation studied the perceptual threshold of aliasing distortion based on the human auditory threshold and auditory masking effect.According to the perceptual characteristics of aliasing distortion,the priority is set to realize side information compression and bit rate control,which can ensure the transmission of crucial perceptive information to adapt to network fluctuations.Compared with the time-frequency shifting and the multi-step residual compensation,the proposed method uses the auditory perception characteristic to reduce the shift information bit rate by 32% and the residual bit rate by 15%.Meanwhile,the average score on subjective listening tests is still above 90.During the bit rate control,the sound quality can be stable under the side information bit rate of 11 kbpso~20 kbpso(spectrum shifting)and 9 kbpso~13 kbpso(multi-step residual compensation). |