| Simultaneous interpretation,which translates simultaneously during the presentation of the speaker of the source language,significantly improves the efficiency of cross-lingual communication.With the rapid development of communication,Internet technology,especially mobile Internet,and the increasingly frequent global communication,the requirements to cross-language communication is increasing.As the fastest and most convenient way of speech translation,simultaneous interpretation is widely applicated.Due to the difficulties in training human simultaneous interpreters and the high work intensity,there is a growing demand for computerized automatic simultaneous speech translation.Therefore,automatic speech translation for simultaneous interpretation has become an important research direction of machine translation.For simultaneous speech translation,cascaded speech translation systems face the problem of mismatch between speech recognition output and machine translation input,while end-to-end speech translation systems are mainly limited by the size of training data.In addition,all simultaneous speech translation systems need to balance the tradeoff between latency and translation quality,especially the problem of translation quality under low latency conditions.In order to solve these problems,this thesis focuses on robust modeling,semantic unit segmentation,end-to-end speech translation modeling and synchronous translation delay optimization,and designs and implements an automatic simultaneous speech translation system.The main work and innovations of this thesis are as follows:1)A robust modeling method for cascaded speech translation is proposed.Aiming at the mismatch between the speech recognition output and the machine translation input patterns in the cascated speech translation system,a robust modeling method based on de-structuring and clause insensitive was proposed.Combined with the improvement of clause strategy,the influence of the cascade error diffusion in the cascated speech translation system was effectively alleviated.In the three main application scenarios of speech,conference and daily dialogue,the translation quality has been significantly improved.2)A simultaneous speech translation framework based on semantic unit segmentation is proposed.In order to solve the problem of latency and translation quality of clause segmentation based simultaneous translation system,a semantic unit partition method based on greedy algorithm is proposed,and a semantic unit translation method based on history constraint is proposed to solve the problem of context incoherence caused by independent clause translation.In the text and speech simultaneous translation task achieved significant improvement over baseline was achieved.3)An end-to-end speech translation modeling method based on data augmentation is proposed.Aiming at the stability of end-to-end speech translation training,a model training method combining pre-training and sequential knowledge distillation is proposed.Aiming at the problem of the scale of speech translation corpus,a data augmentation method based on speech synthesis and machine translation is proposed,which effectively utilizes the large-scale training data of speech recognition and machine translation tasks.On several experimental datasets of different scenes,the proposed method achieves comparable performance to that of the cascaded speech translation system.4)A Cross Attention Augmented Transducer(CAAT)network for simultaneous translation is proposed.Aiming at the joint optimization of simultaneous policy and translation model,Cross Attention Augmented Transducer(CAAT)was proposed,which significantly improves the trade-off between latency and translation quality for simultaneous translation.Compared with the state-of-the-art methods,wait-k,MMA et al.,the CAAT model proposed in this thesis achieves a significant improvement of more than 5 BLEU under low latency conditions.5)An automatic simultaneous translation system is designed and implemented.Based on the above research work,an automatic simultaneous speech translation system is designed and implemented.In order to meet the different requirements for latency and translation quality in different application scenarios such as crosslanguage conference,speech and daily conversation,two schemes of simultaneous translation based on semantic unit segmentation and CAAT are adopted respectively to provide simultaneous interpretation for different latency and achieve good application performance.This work was supported by the Science and Technology Innovation 2030-"new Generation of artificial Intelligence" major project,"Multilingual Automatic Translation Research with Chinese as the Core Language",which are sponsored by iFlyTek.The automatic simultaneous translation system designed and implemented in this thesis has been applied in the cross-language conference system and hand-hold translation machine,which has proved the effectiveness of the above technology and has high application value. |