Font Size: a A A

A Study On The Construction Of A Chinese Cumulus Corpus And The Identification Of Cumulus Structures

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:W H HouFull Text:PDF
GTID:2515306722488534Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Concurrent structure is one of commonly used verb phrase structures in modern Chinese in which the noun is shared by predicate-object phrase and subject-predicate phrase in one sentence.However,due to the complex and changeable sentence patterns of Chinese,concurrent structures usually contain other components which make their structure more complex,and they are similar to the serial verb structures and objective subject-predicate phrases,which brings obstacles in syntactic analysis.Therefore,the recognition of the concurrent structure is important for semantic analysis and downstream tasks.However,in the field of linguistics,the definition of Chinese concurrent structure is not uniform,and there are few existing concurrent corpora.The construction of the concurrent corpus for the Chinese Abstract Semantic Representation(AMR)annotation system is still at a blank stage.The machine recognition and analysis of the concurrent structure are still in the stage of rule matching and manual designing of features for statistical learning,and the work is not good enough to be directly applied.In view of this situation,this thesis has done the following four aspects of research work.(1)A set of corpus tagging specification for the Chinese AMR tagging system is designed.Based on this specification,a corpus containing 4732 concurrent sentences and 5216 concurrent structures was constructed.In this dissertation,we have made statistics on the word frequency distribution,the types and numbers of the concurrent verbs in the concurrent corpus,and analyzed the distribution characteristics of the concurrent structure in the corpus and the methods for automatic recognition of concurrent structure.The automatic identification provides a data basis.(2)A LDPA-Bi LSTM-CRF(Lexicon Dependency Parser Augmented-Bi LSTM-CRF)model is proposed to automatically identify the boundary and internal components of concurrent structure.In this dissertation,this task is divided into two sequence labeling tasks.The model effectively avoids the error propagation of word segmentation and part-of-speech tagging.Meanwhile it adds dictionary information,enriches the representation of the text.As a result,the addition of dependency parser information can assist the model in identifying the dependency relations of sentences,and grammatical structure effectively improve the recognition effect of the boundary and internal components of the concurrent structure.(3)A combined neural network model is proposed to automatically identify the category of concurrent structure.According to the characteristics of the concurrent structure,we proposed to concatenate the word vector with the context information,and use the attention mechanism to extract the key information of the sentence,so as to get the classification of the concurrent structure,and provide categories information for the conversion of concurrent structures into Chinese AMR graphs.(4)Design a serial experiment for the recognition of the concurrent structure,realizing the automatic extraction of concurrent sentences from natural texts,identification of the boundaries,internal components and categories of the concurrent structure,and evaluation for the recognition results of each process at a time.It provides a complete work procedure and results as reference for the automatic recognition of concurrent structures in practical applications.In this dissertation,we construct a certain scale of the corpus for concurrent structures,which provides valuable semantic resource for the recognition of the concurrent structure.The neural network model for concurrent structure recognition we proposed effectively addresses the issue of recognition of the boundary,internal components and category in concurrent structure,which can help in Chinese AMR analysis and other semantic analysis tasks.The method and process for the complete identification of the concurrent structure are proposed,and the function of end-to-end identification of the concurrent structure is realized.
Keywords/Search Tags:Chinese abstract meaning representation, concurrent structure recognition, automatic labeling of concurrent language categories
PDF Full Text Request
Related items