| With the acceleration of global information exchange and the rapid development of multimedia technology,the multilingual information on the Internet is increasing day by day.Compared with single-language information,multilingual information can provide richer content.For example,under the same news topic,different journalists in different countries will report it from different angles,which will also generate different viewpoints,thus helping users to better understand the whole picture of a certain event.Cross lingual summary is a way to extract text from multilingual text by computer,which can reflect the main idea of text.However,many of the world’s languages are low-resource,which makes high-resource large-scale parallel data sets across low-resource languages scarce,and there are not enough training sets to train cross lingual summary models.Based on the above problems,this paper focuses on the method research of cross lingual subject summaries,in order to solve the problem of unbalanced data resources and improve the quality of abstracts.We adopt the pipeline-based cross lingual summary method,the extractive summary method based on the combination of heterogeneous graph and Multi-GCN in the summary task,and the abstract rewriting method based on Transformer in the translation task.The research work completed in this paper is as follows:1.Construct an extraction summary method based on the combination of heterogeneous graph and Multi-GCNAiming at the problems that the existing traditional RNN and LSTM models cannot use the valuable relationship information between words,and the high number of GCN layers will lead to the loss of node feature diversity,this paper constructs an extraction summary method based on the combination of heterogeneous graph and Multi-GCN.This method can introduce more semantic units as additional nodes to enhance the relationship between words.At the same time,GCN is used to encode the initialized sentence feature vectors,which integrates the semantic relations between words and makes up for the deficiency of RNN and LSTM models that cannot use the valuable relation information between words.In addition,two skip join methods are introduced to solve the problem of over-smoothing caused by too many layers of GCN or excessive use of convolution operation.The experiment shows that this method can integrate key words into sentences well,and integrate semantic similarity between words well,so as to extract sentences with higher quality more accurately,reduce the generation of repeated content,and improve the coverage rate of extracted abstracts.2.Build an improved method of abstract rewriting based on TransformerA abstract rewriting method based on Transformer is built to solve the problem that traditional machine translation methods cannot accurately identify translation errors.The method uses a Transformer decoder to form a locator and a modifier.The locator module can accurately mark the error positions in the translation,and the modifier can correct the error positions to the correct positions.In order to further improve the accuracy of translation,this method continues the iterative strategy,constantly using locators and modifiers to generate more accurate target language summaries.Experimental results show that,compared with traditional methods,the proposed method can effectively improve the accuracy of translation abstracts and reduce the number of iterations. |