Font Size: a A A

Research On Text Summarization Technology Based On Word And Paragraph Vectorization Representation

Posted on:2019-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ShenFull Text:PDF
GTID:2428330611993380Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the amount of information such as news,microblogs,and newspapers has exploded,which has greatly satisfied people's reading needs.At the same time,the media information will inevitably be repeated,and the quality of the text is not guaranteed,and the deviation of the title and content is endless.This brings great challenges to people's reading and information acquisition.The best solution is text summaries.In the face of large textual data,manual summaries are time-consuming,labor-intensive and inefficient.In this case,the automatic text summarization is concerned by the society and researchers.The technology mainly extracts the topic information of the text by computer,and generates a short text that can represent the idea of the text center as a text abstract,which greatly facilitates text compression,dissemination and reading efficiency of text readers.As a direction of natural language processing,automatic text summarization,the key point and difficulty lies in how to accurately encode text,and the natural language processing into a language that the machine can "understand",this is also the research content of knowledge representation.There are many studies in the field of knowledge representation,most of which are based on the word bag model,the n-gram model,and the LDA model.Recent research on text processing has gradually shifted to vectorization methods.This learning-based approach is quite effective in practical applications compared to classical methods.The existing word vector technology,paragraph vector technology,represents text as a dense vector,which has been applied to text classification and web page information extraction.Despite the great success,researchers have not been able to fully compare the advantages of vectorization methods over classical methods,and to understand intuitively how much the changes parameters will affect vectorization models.In order to study the research of automatic text summarization related technology,this paper designs and implements a set of automatic text summarization system.The system is mainly divided into four modules: word vector generation module,segment vector generation module,keyword extraction module,and topic sentence extraction module.In the first part,based on the existing word2 vector technology,the word vector optimization technique is proposed,which realizes parallel training of word vectors and completes encoding of all words.The second part generates paragraph vectors corresponding to each text segment based on word vectors.The third part is processed on the basis of the previous part to obtain the keywords in the text;the fourth part is mapped from the keyword to the sentence in the original text,and the obtained sentence is evaluated and extracted to finally obtain the document summarization.Based on the above work,the text automatic summary system is implemented,and experiments are carried out to prove that the system can extract text summarization effectively.
Keywords/Search Tags:Automatic text summary, word vector, paragraph vector, keyword, topic sentence
PDF Full Text Request
Related items