As a basic task in the field of natural language processing,the task of automatic text summarization effectively compresses and refines the document information,helps users retrieve the relevant information needed from the massive information,avoids the problem that too much redundant and one-sided information may be generated through search engine retrieval,and effectively solves the problem of information overload.Because the traditional Chinese and English automatic text summarization technology is difficult to transplant to Khmer,in order to enrich the theory and application of natural language processing in Khmer,this paper studies the automatic text summarization technology of Khmer by using the multi-document extraction method based on deep active learning.The main work of this paper is as follows:(1)Abstract extraction method for Khmer single document based on deep active learning.In order to solve the problem of insufficient tagging of Khmer corpus using deep learning to extract single document,this paper proposes a method combining active learning with deep learning,which solves the problem of insufficient tagging of Khmer corpus in training neural network.Firstly,the quantitative documents are selected by active learning sampling strategy,and the training set is obtained by expert annotation.Then the training model is extracted by combining the encoder decoder model in deep learning.The experimental results show that the method can effectively improve the quality of single document extraction in Khmer,even when the training corpus is obviously under labeled,the values ofR1、R2、RL and increased by 5.15%,4.23%,9.48%and 7.65%,respectively.(2)A multi-document extractive summarization method for Khmer based on layered Maximal Marginal Relevance.Aiming at the problem that the traditional multi-document extractive summarization method cannot effectively use the semantic information between documents and there are too many redundant contents in the abstract result,a multi-document extractive summarization method for Khmer based on layered Maximal Marginal Relevance,Solve the problem that the multi-document abstract of Khmer cannot effectively use the semantic information between the documents,and there is too much redundant content in the abstract result.First input Khmer multi-document text into a trained deep active learning model to extract a single document summary;then iteratively merge all single document summaries according to a similar layered waterfall method,and obtain the final multi-document through the Maximal Marginal Relevance Summary.Experimental results show that this method can obtain multi-document summaries through the use of deep learning methods and Maximal Marginal Relevance algorithms,which can improve the accuracy of multi-document summaries while ensuring the diversity and difference of summary sentences,and effectively improve the number of Khmer.The quality of the document summary,the values ofR1、R2、R3、RL increased by5.15%,8.23%,9.48%and 7.65%,respectively。... |