Font Size: a A A

Statistical Analysis Of Self-built Chinese Semantic Role Labeling Corpus

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:B R HeFull Text:PDF
GTID:2335330515472036Subject:Chinese Language and Literature
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the Natural Language Processing is increasingly affecting human life.In Natural Language Processing,how to make the computer understand the natural language of human beings,so as to realize human-computer interaction is an important problem to be solved.Chinese automatic word segmentation and POS tagging are based on lower level language knowledge and some statistical methods,and we have achieved the higher accuracy,but some ambiguous sentences also need syntactic and semantic analysis to be solve.For natural language understanding,syntactic analysis is only one of the means,semantic analysis is the key point.Without the support of semantic analysis,automatic syntax analysis will be difficult.In the process of the realization of artificial intelligence,semantic analysis showed a hitherto unknown importance and urgency.In order to make the Natural Language Processing system have the high speed of the computer and human intelligence,we must do some semantic analysis.On the basis of the existing syntax treebank,we construct a certain scale semantic role labeling corpus.First of all,based on the HowNet framework and The labeling specification of modern Chinese predicate's semantic role labeling corpus,we labelt the semantic roles of this corpus whitch including manual annotation and manual proofreading.Secondly,through manual semantic roles annotation,we modified and improved the annotation system.And then,we induced the semantic role labeling rules and detect the effectiveness of the rules.Finally,we summarried the content and the results of this study.This paper includes seven chapters.There are the primary coverage of the seven chapters below.The fist chapter is the introduction.In this chapter,we mainly introduced theoretical background,research status,research methods and research significance of this paper.The theoretical background mainly includes Valency Theory,Argument theory and Semantic role theory.We mainly explicated the semantic role relationship type,the construction of semantic role corpus and the semantic role labeling scheme.In terms of the research methods,we mainly adopt the method of corpus,the method of human-computer interaction,the method based on rules and statistics and the combination of rule based approach and statistical based approach.In this paper,we aims to mark the sentences with different syntactic structure and different logic semantics identically and built a certain scale annotated corpus to make some contributions to the semantic analysis and natural language understanding,etc..The second chapter is the overall structure of the semantic role labeling corpus.In this chapter,we mainly introduced the corpus source and scale,a treebank built in the early days,the types of the semantic roles,HowNet case frame dictionary,the labeling platform and the labeling scheme and other infrastructure work.We collected 40000 sentences from People's Daily.Based on the Treebank,we built a semantic role labeling corpus.The labeling platform is remarked by the former treebank labeling platform,and the two interfaces could transform easily.The types of the semantic roles and labeling scheme is based on the The labeling specification of modern Chinese predicate's semantic role labeling corpus.But what is different is that in this paper,we use the HowNet case frame dictionary to label auxiliarilly.It can secure the objectivity and the accuracy when we are labeling.The third chapter is about the common problems and solutions in the process of semantic role labeling.In this chapter,we mainly summarized the problems existing in the process of manual annotation of semantic roles and put forward corresponding solutions to these problems.The annotation problem including three main aspects: missing label,redundant label and wrong label.Each aspect is separately summarized and analyzed from two aspects.One is the annotation of predicate component.The other is the annotation of predicate arguments.Finally,due to these existing problems,we put forward some corresponding methods to resolve it.The methods are correctly affiliated synonyms and selection verbs according to the context etc..The fourth chapter is about the problems of the case frame dictionary and solutions about the problems.Based on manual annotation of the corpus,we summed up the problems existing in dictionary and analysis the causes of thees problems.We also give some resolutions respectively.There are four main problems in the case of the case frame.The first is the case frame of the verb semantic category is wrong.It including the roles of a case frame is not comprehensive,the key roles of the case frame are wrong.The second is the semantic category of a verb is wrong.The third is the semantic category of a verb is not comprehensive It including the semantic category of a verb is not comprehensive and the same case frame is not applied to all the meanning.And the last is the unknown words.As for the reason of these problems,It was explained by the set up of the case frame dictionary,the evolution of meaning,the differences about a similar meaning and generation of new words.Finally,we put forward the method of solving the problems.It including detect the case frame and synonyms affiliation with the method of sentence pattern transformation.It used to the verbs with wrong case frame and case frame is not applied to all the verbs.Other problems were corrected by the method of affiliated synonyms.The fifth chapter is the relationship between syntactical structure and sentence modes and semantic role labeling rules.On the basis of the results of manual annotation and proofreading of semantic roles,we summed the typical sentence patterns of various syntactical structures with the introspective manner.These sentences are mainly subject-predicate sentences including verb predicate sentences,noun predicate sentences and adjective predicate sentences.Among them,the verb predicate sentences include the general verb predicate sentences,Ba sentences,Bei sentences,the concurrent sentences,the predicate sentences,the double object sentences,the ratio sentence and so on.Secondly,according to the marked components,the typical sentence patterns of these sentences are summarized.Based on it,we summed up a set of semantic role labeling rules.Finally,the validity of the proposed rule is tested in the test set,and the situation beyond the rule coverage and the solving strategies are summarized.In the premise of effective,we can apply the rules to semantic role labeling in future.On the one hand,it could play the advantages of high rate of correct rules and reduce the workload of manual annotation.On the other hand,we can use these rules to check out the errors automatically in manual annotation to improve the accuracy of semantic role labeling.The sixth chapter is the conclusion.In this chapter,we summarized the main research contents,results of this paper and the significance of the research on Chinese information processing and Chinese grammar research.Lastly,we analyzed the inadequacies of this study and planed the next tasks.
Keywords/Search Tags:Semantic role, Corpus, Case frame, Sentence pattern, Labeling rules
PDF Full Text Request
Related items