Research On Deep Semantic Annotated Corpus Of Modern Chinese

Posted on:2018-11-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S He

Full Text:PDF

GTID:1485305480457424

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Nowadays,the scholars in the world attach great importance to the building of knowledge resources of natural language processing,and have established many corpus with different annotative information so as to adapt the needs of profound study of language and natural language processing system.In terms of Chinese information processing,what is on demand for Chinese annotative resources is sentence-level resources,a kind of annotative corpus being capable of describing the deep semantic information between words and words in sentences.Therefore,it is very urgent to pursue the strategies,models,techniques and methods to semi-automatically,even automatically build a large scale deep semantic annotative corpus.Semantic analysis technique is the most important and difficult problem in the field of natural language processing.How to realize an effective,profound and automatic sentence semantic analysis has always been an important object which is focused by researchers of NLP both at home and abroad.At present,semantic researches in the field of NLP is mainly focusing on a surface semantic analysis.Although the surface analysis reduces the difficulty of semantic analysis,it only solves the verb-kernel and the assignment problems of semantic roles,leaving modal elements and internal semantic relations of roles carried by noun-kernel construction unannotated,thus it can not thoroughly reveal the semantic structure of a sentence.Our deep semantic annotative corpus mainly aims to semantically classify corpus of Middle School and Primary School Chinese textbooks and annotate semantico-syntactic categories.Semantic classes of words refer to meaning category that a word belongs to.Syntactico-semantic category refers to the information of semantic categories corresponded by a semantic chunk,including core category,modifying and restricting category,modal category,and super-sentence category.The classification of semantic classes of words and the analysis of semantico-syntactic categories are fundamental researches of semantico-grammar.Semantico-grammar believes that only within the restriction of semantico-syntactic framework,can the classification of semantic classes of words cast off blindness,and that also only on the basis of the classification of semantic classes of words,can precision of the annotation of semantico-syntactic categories be improved,both of them supplement each other.At present,most of the system of semantic classifications in dictionaries both at home and abroad are based on natural sciences or common sense.Compared with these common sense-based semantic classification,our developed semantic classifications mainly feature in that semantic classifications depend on the needs of semantico-syntax,can solve the problems which are difficult to solve only by syntactic analysis,and form a set of unique system of annotations facing computer language processing,thus having a great significance on the semantico-syntactic automatic analysis in Chinese information processing.According to this guiding ideology,in the approaches to annotate semantic roles,we use a processing strategy different from the traditional approaches,which we call semantic sentence patterns and semantic classes of words-based approach to annotate semantic roles.The approach transforms the problem of the classification of nodes in the annotation of Chinese semantic roles into the problem of sequence annotation,and because of avoiding the stage of the traditional syntactic analysis,makes the annotation of semantic roles cast of the dependence on syntactic analysis,thus avoiding the time and performance restrictions caused by Chinese grammar analyzer.The tests show that the new approach can reach a higher precision rate,and also greatly saves the time of analysis,thus being in favor of practical applications.The dissertation conducts a series of relevant technical research around the construction and application of Modern Chinese Depth Semantic Annotation Corpus.The main achievements are as follows:1.Aiming at the construction of corpus and its applied needs,we develop such applied softwares as a tool to compile a dictionary of semantic classes of words,a tool to automatically annotate semantic classes of words,a statistical tool to index semantico-syntactic categories,a statistical tool to index the correspondence between semantic classes of words and semantico-syntactic categores,a statistical tool to extract sentence patterns,a statistical tool to extract sentence modes,and a tool to supplementarily annotate semantico-syntactic categories,provides a very good technical support to the construction and application of Modern Chinese Depth Semantic Annotation Corpus.2.We collect and develops a dictionary with over 40000 semantic entries,annotate such information as their parts of speech,word classes,and frequency,thus providing a linguistic knowledge support to the automatic annotation of semantic classes of words.3.Aiming at the automatic annotation of semantic classes of words,we design an annotation algorithm based on Hidden Markov Model,by combining with Viterbi algorithm based on dynamic planning,still reaches the result that in small scale of training corpus and severely sparse of data,the correct rate of the closed test is 94.3%,and the correct rate of the open test is 89.1%.4.Aiming at the problem of unregistrated words in the annoation of semantic classes of words,we propose a Hownet concept based approach to process unregistrated words.The research shows that the correspondence existing between the system of semantic classes of words and the concepts in Hownet is mainly represented in the two aspects of referential class-substance class,indicative class-event class,and hereby make their corresponding processing rules.5.Aiming at the problem of annotating semantic roles in semantico-syntactic categories,based on the summary and comparison of the existing mainstream algorithms,we propose an algorithm to annotate semantic roles based on the framework of semantic sentence patterns and semantic classes of words,and by adopting IOB strategy,making use of CRF model,and combining with optimalized feature parameters,reach a better result that the precision rate of classification is over 941.8%,and the value of system F is 78%.6.Aiming at the already annotated depth semantic annotation base,through developing a relevant tool software,we establish a knowledge base of the correspondence between semantic classes of words and semantico-syntactic categoris,a framework base of semantic sentence patters,a knowledge base of semantic sentence modes,thus laying a better foundation for the further research and application of semantico-syntax.Based on automatic annotation of semantic classes of words and the semantic roles in semantico-syntactic categories,the research on the annotation verifies feasibility and practicability of semantico-grammar in natural language process in terms of practice.The research results of the dissertation further enrich the theory and methods of semantico-grammar,provide a new approach to realize deep semantic analysis of Chinese sentences,and also provide a new technical support to the application system based on semantic analysis in the field of natural language processing.

Keywords/Search Tags:

semantico-grammar, semantic classes of words, semantico-syntactic categories, unregistrated word, semantic role annotation

PDF Full Text Request

Related items

1	On The Constraints From Semantic Roles To Syntactic Constituents And The Semantic Speculation Of The New Words
2	Estudio semantico de las interferencias ingles/espanol en 'Gatherings from Spain', de Richard Ford
3	On Situational-semiotic,Linguistic-semantic And Literary-psychological Sylistics
4	A Quantitative Study Of Tibetan Semantic Roles
5	A Syntactic And Semantic Study On Noun-Noun Constructions
6	The Study Of Semantic Syntactic Categories And Semantic Sentence Patterns Of Four-year-old Children
7	Modern Chinese Way Word Study
8	Words have grammar: Investigating semantic and syntactic aspects of word knowledge for verbs
9	"X Shi" Turning Words Which Span Two Word Classes Of Adverb And Conjunction
10	A Syntactic And Semantic Study Of Weak Willingness In Modern Chinese