Font Size: a A A

Semantic Integrity Analysis Based On Recurrent Neural Network

Posted on:2020-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:J M Y LiuFull Text:PDF
GTID:2392330578960901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of scientific and technological information,natural language processing has gradually become a research hotspot in the field of computer science and artificial intelligence.The main task of semantic integrity analysis is to judge whether a sentence is semantic integrity.It is the preliminary work of natural language processing tasks such as long text syntactic analysis,semantic analysis,machine translation and so on.In the automatic scoring of subjective questions,especially the subjective evaluation of long text answers,it is necessary to divide the student answers and standard answers,which is to divide into multiple semantically complete sentences,and then perform syntactic and semantic similarity matching.The reason for the semantic integrity analysis of Chinese is that there is no strict grammatical restriction on the use of punctuation in Chinese.In particular,the use of commas is more arbitrary.The commas can be used for the separation of semantic complete fragments,and can also be used when the semantics are incomplete.Therefore,it is of great significance to use the latest natural language processing technology to analyze the semantic integrity of Chinese sentences.This paper proposes a semantic integrity analysis method based on recurrent neural network.By judging whether the sentence is semantically complete,the long text is divided into multiple semantic complete sentences.The main innovations of this paper are as follows:(1)For the processing of input data,this paper proposes an idea based on circular window to convert variable length sequences into fixed-length sequences that are acceptable for recurrent neural network.At the same time,the circular window is used to avoid the loss of context feature information after the random under-sampling processing.(2)An improved random under-sampling method is used to deal with the classification imbalance problem generated after labeling.The comparative experiments show that the improved random under-sampling method proposed in this paper can effectively solve the problem of classification imbalance and improve the accuracy of the model.(3)A semantic integrity analysis model based on double-layer Bi-LSTM is proposed.The characteristics of Bi-LSTM are used to obtain the contextcharacteristics of the input sequence.At the same time,by stacking Bi-LSTM,the output of the previous layer is re-abstracted into new features for later learning.In addition,the Dropout strategy is used to prevent model overfitting.Through a large number of parameter comparison experiments,the appropriate neural network parameters are selected,and the final accuracy can reach 91.61%.In this paper,a recurrent neural network model based on double-layer Bi-LSTM is used to automatically label long texts.From the experimental results and project usage,this method can better solve the problem of semantic integrity of annotation.In the process of applying the model to the production environment,we can combine the dependencies between tags to output the results of the model and further improve the prediction results according to certain part-of-speech rules.
Keywords/Search Tags:natural language processing, Recurrent Neural Network, semantic integrity, sequence labeling
PDF Full Text Request
Related items