Font Size: a A A

Design And Implementation Of Visual Semantic Enhanced Machine Translation System

Posted on:2023-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2568307061953949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The semantic parallel and complementary relationship between multi-modal high-dimensional heterogeneous information can improve the completeness of context in translation and reduce the uncertainty of the semantic prediction.The dynamic architecture of multi-modal semantic parallel fusion of translation agents is designed to realize the scalability and pluggability of the multi-modal fusion components.At the same time,aiming at the lack of modalities and high noise in natural scenes,two multi-modal semantic fusion methods of visual topic guidance and detail intervention are implemented in the system architecture.Specifically,it includes visual topic-enhanced encoding structure and visual detail-fusion decoding structure.Through the multi-granularity progressive fusion from topic to detail,the translation system is suitable for the natural environment of low-quality multi-modal information.Specific work includes:1.The translation architecture of multi-modal semantic parallel fusion: Analyze translation systems’ adaptability requirements and critical functions in low-quality multi-modal real-world scenarios.An extensible and pluggable translation architecture for the parallel fusion of modal semantics is proposed,accommodating multiple interaction modes of modalities in the domain.Enhance the modal autonomy of the translation system.2.Visual topic-enhanced encoding structure: Visual topic semantics guide the model to pay attention to the parallel semantics of heterogeneous modalities.Act in the absence of visual modalities.The multi-modal fusion architecture at the encoding is embodied as multi-modal representation learning.Obtain images related to text topics from search engines.Use the topic consistency between modalities to construct a cross-language and cross-modal semantic space while translating.The encoder can integrate syntactic features and semantic information at the same time.The ability to extract semantic features reduces the phenomenon of mistranslation and omission in translation.Compared with similar studies,the structure has better semantic representation learning ability and modal adaptability,and the translation quality is better under the same conditions.3.Visual detail-fusion decoding structure: Visual detail semantics mine higher-level semantic relationships between modalities.Act on high-noise scenes.The multi-modal fusion architecture at the decoding is embodied as multi-modal alignment fusion.A visual-semantic cross-modal attention mechanism that integrates modal attention and adaptive attention is proposed to provide visual semantic support for context loss and ambiguity.Modal attention learns through semantic alignment between images and texts and integrates object-level image features into decoding.Adaptive attention uses a gating mechanism to suppress the fusion of unrelated image-text information and reduce visual noise.Compared with similar studies,the structure has better cross-modal fusion efficiency in realistic high-noise environments.4.Detailed design and implementation of the translation system: According to the translation structure and fusion method,the preprocessing module,translation module,and interaction module are designed in detail.The efficiency of each module is optimized according to the actual application requirements.The system is tested in the actual environment.The test results in the natural environment show that the visual semantic enhanced machine translation system is better than mono-modal machine translation in translation effect and meets users’ core function and performance requirements.
Keywords/Search Tags:Multi-modal machine translation, Topic-enhancement, Cross-modal attention
PDF Full Text Request
Related items