Font Size: a A A

Research Of Auto-identifying The Relation Markers Of Compound Sentence For Chinese Information Processing

Posted on:2012-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B ShuFull Text:PDF
GTID:1115330335467552Subject:Chinese information processing
Abstract/Summary:PDF Full Text Request
As an important entity unit of Chinese grammar, compound sentence gains much concern in the grammarians and has lots of relevant outcomes and theories. However, from the perspective of Chinese information processing, the processing of Chinese compound sentences has less relevant results, information engineering of compound sentence has not yet made any substantial progress. The reasons are, firstly, the study does not go deeply inside, and the existing studies have not yet include all aspects and problems of compound sentences information processing. Secondly, most of the research results are for the people, and the operability of many ways is not strong in information processing. Thirdly, each study is relatively isolate, and has not link the others, so has not form an organized whole. Currently, the study of compound sentence information technology is mainly about the identification of clause and non-clause and identification the layers of compound sentence, but the extraction of the relational markers is the premise of all the studies. So we can see that, on the one hand, the automatic extraction of relational markers is the base of other various studies which can be carried out, on the other hand, relational markers as a part of compound sentence needs further study. In this situation, this paper takes the Chinese information processing as a starting point, and takes the compound sentence theory of Mr. Xing-fuyi as guidance, to research and study the automatic identification and markup method of relation markers, and with the automata theory and formal logic as auxiliary means to model the issues involved about identifies of relation markers, and to describe and storage the rules, to design the prototype model of auto-identification system of the relation markers based on rules.The study of this paper involves the following four parts:Firstly, it sums up the factors which have influence on auto-markup of relation markers comprehensively. The factors are classified mainly into five categories, which are the influence of the adverbs, of the prepositions, of the different usage of relation markers, of the collocations and the occurrence and non-occurrence respectively. For each type of factor, it mainly analyzes its features and make up the corresponding strategies.Secondly, it studies the co-occurrence of relation markers, and mainly focuses on syntax and semantic function and types of the two and three markers. There are two types of the two markers' co-occurrence, which are contradiction type and constriction type. It not only can reduce the unnecessary computing in the processing, but also can be seen as a checkpoint in the analyzing of compound sentences for distinguishing the two types. In the co-occurrence of three markers, it does not have a unified strategy, but needs to use different method to identify the different markers.Thirdly, it studies the relation between the mark-up of relation markers and the pattern of sentences. It mainly studies three kinds of patterns. The first is called special pattern, the feature of this pattern is that its form is solid and is unambiguous, but the jurisdiction range of the markers is hard to determine result from the hard identification of its semantic relationship. The second is called expanding pattern, the ordinary algorithms can not deal with the identification of relation markers of this pattern. The third is called ordinary pattern, the feature of this pattern is that the compound sentences of this pattern have multiple semantic layers, and have multiple relation markers pairs. For special pattern, it uses the strategy of mapping to map the sequence of relation markers to its corresponding mark-up results. For expanding pattern, it uses the automata theory to modeling, by doing this, it not only assures its operability, but also assures its generality of the phenomenon. For ordinary pattern, its strategy is to abstract the problems, and transform the problems to mathematical models, and then uses the computing of resolution space to deal with the sequence of relation markers.Fourthly, it studies some problems partly in saturated mode and non-saturated mode. Firstly, the paper makes some supplementary of the theory of semantic relevancy. It proposes 14 semantic relevance features which classified into three categories, and makes out a preferred diagram for feature analysis. It also amends the computing method of semantic relevance degree. It mainly studies the relation markers "bushi…jiushi…", "suiran…danshi…suoyi…" in saturated mode. It finds that the method of polarity analysis can be used to deal with the markers of "bushi…iushi…". For markers "suiran…danshi…suoyi…", it does not have any effective method except for building common sense knowledge base. It mainly studies the relation markers of the sentences with three clauses in non-saturated mode. It finds that by considering the typical and atypical attribute of relation markers, and combining the knowledge of collocation, and using the semantic relevance feature of clauses, it can mark-up the relation markers accurately.
Keywords/Search Tags:relation marker, auto-markup, automata, resolution space, rule base, semantic relevancy
PDF Full Text Request
Related items