Font Size: a A A

CAMR Semantic Library Construction And Statistical Analysis Based On Conceptual Relationship Alignment

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y WenFull Text:PDF
GTID:2435330548980590Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Semantic analysis is the key and difficult point in the study of natural language,and it is also a bottleneck in the field of Natural Language Processing.To achieve precise semantic analysis,a more appropriate and more functional semantic representation is indispensable.AMR(Abstract Meaning Representation)allows the graph structure,the additions and deletions,and it is a single-root-structure,so that it can represent the semantics of a sentence more completely.However,the neglect of the semantic structure and the sentence alignment in AMR annotation not only brings loss to the automatic semantic analysis,but also limits our deep exploration of semantics,especially for some special semantic structures,such as non-projective structure.In addition,there is a lack of semantic representation resources based on complete sentences in China,and it also brings difficulties for the improvement of automatic semantic analysis.Therefore,this article mainly includes the following three aspects:Firstly,we improve the AMR annotation rule and get the aligned CAMR(Chinese AMR)annotation rule.We add the alignment information between the semantic structure and the original sentence.In detail,most of the substantive concept aligned to the conceptual node of the CAMR structure and most of the function words aligned to the relations of the CAMR structure.In other words,there is a more complete mapping of the semantic representation of a sentence.Secondly,we build a semantic resource of aligned CAMR,which includes more than ten thousand sentences.At present,more complete Chinese semantic representation resources are not sufficient in China.The average sentence of our CAMR semantic resource is about 22.4 words,with a total of 10149 sentences.It not only attributes to the study of Chinese automatic semantic analysis,more importantly,it helps to study the Chinese semantic structure.Last but not least,we analyze the semantic structure of Chinese language basing on the 10149 CAMR sentences.Besides exploring the functional distribution of the function words and find out the usage rules of Chinese function words,we also discuss the linguistic phenomena of super tree structure,including graph structure and non-projection structure.The conclusion of the following three aspects is obtained.(1)Function words:The function distribution of Chinese function words is discussed.13.85%edges of CAMR structures contain words and 12.95%words are located at edges.It shows that the semantic relation of about 87%is determined by the word order,and the word order of Chinese is very important to the expression of semantics.The word which has the most flexible function is "DE(?)",it appears at 58 kinds of relations.The word "?(?)" appears at 28 kinds of relations.(2)Graph structure:It is found that graph structures are widespread in Chinese language(47%of sentences are graph structures),and the reason for generating graph structures is mostly caused by the sharing of arguments(ARGO,ARG1,ARG2,ARG3).(3)Non-projection structure:The non-projection structure in Chinese is a difficult and key point in our study.Based on the CAMR semantic resource,we find that the non-projection structure in Chinese occupies a certain proportion(3.6%).The causes of non-projection structures include the promotion of modal words,the topic shifting,the separation of components,and the general displacement.The most frequent type is the promotion modal words(52.37%),followed by the separation of components(28.49%),followed by the topic shifting(13.34%)and general displacement(5.14%).Modal word promotion and component separation are usually associated with specific words or phrases.That is to say,if we make these specific words or phrases into a trained vocabulary,it will help to improve the effect of automatic semantic analysis.In summary,we formulate an improved version of the CAMR semantic annotation specification including alignment information,and construct a more large-scale CAMR semantic corpus.On this basis,the function distribution and usage of the Chinese function words are analyzed.Also,the structure of the super tree in Chinese is concerned.Besides,it is found that the structure of graph is more common and the non-projection structure has a certain proportion in Chinese language.At the same time,the type and the proportion of the non-projection structure in Chinese are also analyzed.
Keywords/Search Tags:AMR, Semantic Annotation, Alignment, non-projective, graph
PDF Full Text Request
Related items