Font Size: a A A

Research On The Method Of Extracting The Tag From Chinese Compositions Of Primary School Based On Text Automatic Abstract

Posted on:2019-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y C WuFull Text:PDF
GTID:2428330548967115Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The compositional material is indispensable in the process of informatization of Chinese composition teaching for primary school.However,there are a large number of Chinese compositional materials on the Internet but lack of personalized recommendation model,which can easily cause pupils' knowledge overload.At the same time,the features of the unstructured data in the Chinese compositional material make it difficult for the computer to read and store it,which in turn hinders the effective organization of the data.A text label is a set of vocabulary descriptions of textual content that contains the information needed to convert unstructured data texts into structured data texts.At the same time,there is redundancy in the original corpus of Chinese compositional material of primary school,which will interfere with the acquisition of the text label.Therefore,in order to improve the problem of information overload in compositional automatic assistance,this thesis studies the method of textual abstraction for text corpora to effectively remove redundancy and extract the content of the text center.After presented a detailed definition of composition label,the innovation of this thesis is to propose a method of extracting Chinese composition labels of primary school based on automatic text summarization.The main research work of this thesis is as follows:(1)Early exploration.This thesis has carried on the sufficient research to the key technology involved in the extraction process of the compositional label.Subsequently,based on the comprehensive analysis of efficiency and feasibility,this thesis selects extractive text automatic summarization technology,dictionary-based word segmentation technology and supervised named entity recognition method as the main technical framework in the process of tag extraction.(2)Similarity algorithm selection.In this thesis,the main text extraction algorithm TextRank is used to extract the original text for removing redundancies.At the same time,the thesis compares some classical similarity algorithms including similarity algorithm based on edit distance,similarity algorithm based on Word2Vec and similarity algorithm based on BM25.Based on the comprehensive evaluation of ROUGE score and time efficiency,the BM25-based similarity calculation method was selected to calculate weights for TextRank.(3)Tag extraction.This thesis presents a definition of Chinese compositional tag of primary school,which is composed of article classification,core entities and key descriptions.At the same time,according to the classification,the evaluation index of the label extraction effect is defined.This thesis obtains relevant information by segmentation and part-of-speech recognition and named entity recognition of texts and limits the number of words and frequency of each category to ensure the accuracy of the results.Experiments show that the extraction strategy of Chinese compositional tag of primary school proposed in this thesis has significant advantages comparing with traditional keyword extraction algorithms.(4)Application scenario analysis.This thesis makes a preliminary exploration of the application scenario of Chinese compositional tag of primary school,and proposes an RDF model for containing tags.,which can generate structured data while meeting the needs of linked data construction.
Keywords/Search Tags:Automatic text summary, Compositional tag, TextRank, RDF
PDF Full Text Request
Related items