Font Size: a A A

Chinese Sentence Semantic Annotation And Statistical Analysis Based On AMR

Posted on:2018-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:L J BuFull Text:PDF
GTID:2355330518990862Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Semantic analysis has been a challenging subject for decades, and is becoming more and more important in the study of machine translation and artificial intelligence. AMR (Abstract Meaning Representation) as a study of whole-sentence meaning representation, abstracts the meaning of a sentence into a single, traversable directed acyclic graph which can represent the meaning of a sentence more clearly. AMR focuses on the concepts and the relations between the concepts which are abstracted from words. Adding concepts and deleting words according to the meaning of a sentence is one of AMR's advantages. The study of AMR focused most on English corpus and its specification cannot be used on Chinese corpus without modification. This study focused on the meaning representation of Chinese sentences. After introducing AMR's development, specification and the study of AMR parsing, this study proposed CAMR (Chinese AMR), a meaning representation method for Chinese sentences.CAMR was based on AMR with new features like sentence relationship. It also focused on special language phenomenon and syntactic structures of Chinese sentences. The specification of CAMR contained two main parts: Concepts and Relations. CAMR specified the representation of anaphora, mood,interrogative pronouns, numeric types and proper nouns in Chinese as well as concepts to represent sentence relationships. CAMR also specified 5 core relations and 42 non-core relations. Each kind of relation was exemplified with several Chinese sentences and their CAMR representations.The second part of this study was the annotation of Chinese corpus with CAMR. The Chinese edition of The Little Prince with 1562 sentences was chosen firstly to be annotated along with the modification of CAMR's specification. The corpus of CTB (Chinese Tree Bank) with 5000 sentences was annotated with the modified specification of CAMR subsequently.The last part of this study was the counting and analysis of the corpus. The counting found that 39.96%sentences of the corpus contained graph structure within, which prove the necessity of graph structure while representing Chinese sentences. As for the options of adding concepts and deleting words, the counting found 95.2% sentences had added concepts and 96.94% sentences had deleted words while being represented into CAMRs. Thus prove the necessity of the options of adding and deleting. Finally, since the predicates is important in the study of syntax and semantics, the predicates with their argument structures are counted. A dictionary about the argument structure of predicates was built based on the counting results.
Keywords/Search Tags:Sentence Meaning, Semantic Annotation, Abstract Meaning, Semantic Analysis, Chinese Information Processing
PDF Full Text Request
Related items