| Dialogue analysis is a basic topic for natural language dialogue applications such as intelligent customer service and chat robots.However,there are a large number of emotional phrases,omissions,and word order inversion in the dialogue,which have a greater impact on the syntax and semantic parsers.The accuracy of automatic dialogue analysis is lower than the written corpus.The main reason is the lack of a rigorously formalized description of multi-rounds of dialogue,which is not conducive to subsequent analysis and calculations.Therefore,on the basis of combing the annotation systems and corpora for dialogue at home and abroad,this paper proposes a discourselevel multi-round dialogue annotation system based on abstract meaning representation,discusses the discourse-level semantic structure annotation method,and gives the alignment of words and concepts.The scheme adds corresponding semantic relations and concepts to appellation terms and emotional phrases,adjusts the argument structure of subjective emotional words,and stipulates some special phenomena in dialogues,annotates and analyzes constructions in dialogues,and designs an artificial annotation platform has been established to lay the foundation for large-scale multi-round dialogue corpus annotation and computing research.Based on the semantic characteristics and unique phenomena of dialogue,this paper establishes a set of meaning representation methods that are more suitable for Chinese dialogue——DAMR(Dialogue AMR).The emphasis is on the existence of a large number of emotional phrases,ellipsis,word order inversion,construction and other phenomena in dialogues,and a formal marking system of semantic associations in multiple rounds of dialogue.The main research results and conclusions are as follows:On the basis of combing through a large number of literatures on dialogue research in linguistics and computational linguistics at home and abroad,this paper combs out the main research methods,annotation systems,and constructed corpora,and finds that the current dialog annotation scheme are relatively single and lack of semantic annotation.Then,a discourse-level multi-round dialogue annotation system based on abstract meaning representation is proposed.The discourse-level semantic structure annotion method is explored,and a dual-layer alignment scheme for the relationship between words and concepts is proposed,which adds corresponding semantic relationships for appellation terms and emotional phrases.The concept of harmony adjusts the argument structure of subjective emotional words,stipulates some special phenomena in dialogue,and designs an artificial annotation platform suitable for discourse-level semantic annotation.On the basis of the DAMR scheme,this article has annotated 1,000 SMS text messages dialogue corpus to verify the feasibility of the annotation method.A specific statistical analysis of the concept phenomenon of cross-sentence usage of annotated corpus is carried out.After statistics,it is found that there are 8 kinds of cross-sentence use concept phenomena,which can be divided into three types: omission,reference and compound sentences.In addition,it is also found that the sentence distance of most cross-sentence use concept phenomena is less than 4,and the semantics of crosssentence use concept There are three types of content: child nodes,subtrees or graphs,and partial subtrees.Analyzing the concept of cross-sentence use is a prominent feature of discourse-level dialogue.This feature helps to explore the semantic connection in the dialogue,and also helps the computer to automatically analyze the semantics of the multi-round dialogue,especially for the tasks of reference resolution and omission recovery in the automatic dialogue processing.Since there are many constructions that are different from the conventional structure in the conversation corpora,which has an impact on the annotation,this article systematically labels and analyzes the constructions in Chinese.Construction,as a structure that cannot completely correspond to the actual meaning of its constituent components,is quite different from regular sentences.It has a greater impact on syntax and semantic parsers,and the automatic analysis of constructions is even more difficult.Therefore,it is necessary to study the internal structure annotation and corpus construction of the construction.Since the semantic structure of the construction is quite different from the syntactic structure,we use the Chinese Abstract Semantic Representation(CAMR)to directly annotate the semantic structure of the construction.At present,the most comprehensive construction library is the Chinese Construction Knowledge Base of Peking University.After manually annotation and counting a total of 1057 constructions in the construction library,it is found that CAMR can express61.2% of the constructions that basically conform to the principle of composition.However,38.8% of the constructions that do not conform to the principle of combination need to modify or add concepts,and there are situations such as lack of concepts,difficult to separate components,and difficult to express rhetorical meaning.The annotation and analysis of the construction can provide a theoretical and data basis for the automatic analysis of the semantics of the construction. |