Research On Extractive Summarization Methods For Cambodian Language Multi-documents

Posted on:2021-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:B B Yu

Full Text:PDF

GTID:2518306200453364

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

As a basic task in the field of natural language processing,the task of automatic text summarization effectively compresses and refines the document information,helps users retrieve the relevant information needed from the massive information,avoids the problem that too much redundant and one-sided information may be generated through search engine retrieval,and effectively solves the problem of information overload.Because the traditional Chinese and English automatic text summarization technology is difficult to transplant to Khmer,in order to enrich the theory and application of natural language processing in Khmer,this paper studies the automatic text summarization technology of Khmer by using the multi-document extraction method based on deep active learning.The main work of this paper is as follows:（1）Abstract extraction method for Khmer single document based on deep active learning.In order to solve the problem of insufficient tagging of Khmer corpus using deep learning to extract single document,this paper proposes a method combining active learning with deep learning,which solves the problem of insufficient tagging of Khmer corpus in training neural network.Firstly,the quantitative documents are selected by active learning sampling strategy,and the training set is obtained by expert annotation.Then the training model is extracted by combining the encoder decoder model in deep learning.The experimental results show that the method can effectively improve the quality of single document extraction in Khmer,even when the training corpus is obviously under labeled,the values ofR₁、R₂、R_L and increased by 5.15%,4.23%,9.48%and 7.65%,respectively.（2）A multi-document extractive summarization method for Khmer based on layered Maximal Marginal Relevance.Aiming at the problem that the traditional multi-document extractive summarization method cannot effectively use the semantic information between documents and there are too many redundant contents in the abstract result,a multi-document extractive summarization method for Khmer based on layered Maximal Marginal Relevance,Solve the problem that the multi-document abstract of Khmer cannot effectively use the semantic information between the documents,and there is too much redundant content in the abstract result.First input Khmer multi-document text into a trained deep active learning model to extract a single document summary;then iteratively merge all single document summaries according to a similar layered waterfall method,and obtain the final multi-document through the Maximal Marginal Relevance Summary.Experimental results show that this method can obtain multi-document summaries through the use of deep learning methods and Maximal Marginal Relevance algorithms,which can improve the accuracy of multi-document summaries while ensuring the diversity and difference of summary sentences,and effectively improve the number of Khmer.The quality of the document summary,the values ofR₁、R₂、R₃、R_L increased by5.15%,8.23%,9.48%and 7.65%,respectively。...

Keywords/Search Tags:

Khmer, active learning, extractive summarization, deep learning, MMR

PDF Full Text Request

Related items

1	Extractive Summarization For Long Documents Without Manual Annotation And Low-resource Scenarios
2	Research On Extractive Text Summarization Method Based On Unsupervised Ensemble Learning
3	Research On Extractive Multi-document Summarization Using Supervised Deep Learning
4	Research On Extractive Summarization Of Scientific And Technological Information Text Based On Deep Learning
5	Research And Application Of Extractive Text Summarization Method Based On Contrastive Learning
6	Research On Meeting Text-oriented Extractive Summarization
7	Research On Extractive Sentence Compression Techniques For Cross-domain
8	Research On Automatic Text Summarization Method Based On Deep Learning
9	Research On Text Summarization Method Based On Representation Learning
10	Research And Implementation Of Automatic Extractive Summarization On Medical Papers