| The syntactic and semantic analysis of Tibetan natural language is not only the premise and basis for the intelligent development of Tibetan information processing technology,but also provides theoretical support and technical solutions for the inconsistency of word segmentation units,the inconsistency of part-of-speech markers and related technical bottlenecks in Tibetan lexical field,which plays a connecting role in Tibetan natural language processing.Tibetan linguistics has a long history and has formed a relatively complete grammatical system.However,traditional Tibetan grammar pays attention to the phonetic form and grammatical function of function words,and does not pay much attention to the semantic relationship between concepts and the structural problems between sentence components.Therefore,from the perspective of information processing,it is difficult to formalize the traditional Tibetan grammar.The study of Tibetan syntax and semantics for natural language processing is in its infancy.Due to the lack of description of grammatical structure in traditional Tibetan grammar,some research institutions and experts and scholars use Chinese syntactic theory and semantic analysis methods to analyze Tibetan grammar.This research method has the following questions to ponder:First of all,language expresses meaning through form.In terms of the dependence of meaning on form,Chinese and Tibetan languages choose completely different ways to make them show their own characteristics.Although Chinese and Tibetan are considered to belong to the same language family,this kind of "homology" is still under hypothesis.From the perspective of sentence structure,the differences between Tibetan and Chinese are greater than the similarities.Modern Chinese sentence structure mainly relies on the word order,while Tibetan language relies more on the case marker or case auxiliary.Therefore,the grammatical rules followed by the two languages are different.Secondly,the semantic level of modern Chinese sentences can only be related to the syntactic structure through subjectification.The surface syntax and deep semantic analysis are two different linguistic levels,so the research strategy of "syntax before semantics" is adopted.The deep meaning of Tibetan sentences is directly reflected in the outer structure of sentences through case markers or case auxiliary words,so their syntax and semantics should be "integrated" regardless of order.Finally,the part-of-speech division of vocabulary mainly depends on its grammatical function.Using the standard phrase-based grammar of modern Chinese to describe the structure of Tibetan sentences not only adds an additional research content,but also greatly reduces the sentence-generating ability of Tibetan grammar,which also leads to the inconsistency of the division of Tibetan word segmentation units and their part-of-speech markers.Therefore,"What kind of grammatical system can not only embody the grammatical characteristics of Tibetan itself,but also facilitate formalization" is a research topic with a high theoretical level.Therefore,through the comparative study between Fillmorege grammar and traditional Tibetan grammar,and drawing on the relevant knowledge of modern linguistics,this thesis extends,extends and deepens the traditional Tibetan grammar while inheriting the traditional Tibetan grammar.This thesis will develop "case" into a new Tibetan grammatical unit,namely "case structure",and then put forward and prove the view that "case structure is the most direct component of Tibetan sentences".It holds that the underlying semantic relationship of Tibetan sentences is directly reflected in the surface syntactic structure through "case structure".Therefore,the study of Tibetan syntax and semantics is not in order and belongs to the category of "case structure".At the same research level,integrated research is needed.By introducing the verb-centered theory of modern linguistics into the traditional Tibetan grammar through the case structure,a grammatical theory that can describe the grammatical characteristics of Tibetan language,namely the Tibetan case grammar framework,is constructed for the processing of Tibetan natural language.The Tibetan lexical analysis,syntactic analysis and semantic analysis methods based on this grammar are discussed,and the formalization of the grammar is discussed preliminarily.This thesis mainly studies the following aspects:First,the differences between Tibetan and other languagesThere are structural differences between individual languages,which are superficial and individual.However,semantic relations are deep and common,and they are common phenomena in all languages.The deep semantic relations expressed by sentences are reflected in different ways in the surface structure of sentences in different individual languages.By analyzing the structural differences between Tibetan,Chinese and English,this thesis further proves that "Chinese is organized by word order,while Tibetan is organized by function words".That is to say,for a sentence,Chinese expresses its deep semantic relationship through word order,while Tibetan expresses it through some functional words.Therefore,in Tibetan natural language processing,it is not advisable to copy the Chinese parsing method,thus putting forward the necessity of constructing "Tibetan case grammar".Second,inherit and expand traditional Tibetan grammarAlthough the "lung ston pa rtsa ba sum cu ba" and the numerous annotations made by scholars in past dynasties have formed the system of traditional Tibetan grammar,in terms of the strict meaning of "grammar",the traditional Tibetan grammar still lacks the parts of lexical,syntactic and semantic analysis.However,the part of "special function words" mentioned in traditional Tibetan grammar has a very similar cognitive framework to the deep case in Fillmore case grammar theory.By comparing the Fillmore case grammar theory with the traditional Tibetan grammar,it is found that the case described in the traditional Tibetan grammar does not have a unified structural form and its content is relatively broad.It involves not only the concept of case proposed by Fillmore,but also the noun and noun phrases.Compared with the case in modern linguistics,the case in traditional Tibetan grammar can be interpreted as different grammatical units.Therefore,for Tibetan natural language processing,the case in Tibetan traditional grammar loses the possibility of being an independent grammatical unit.However,if we redefine "case" in traditional Tibetan grammar and formally unify "case" in traditional Tibetan grammar and introduce it into Tibetan grammar as a new grammatical unit(i.e."case structure"),then all "case" in Tibetan has a unified structural form.There is only semantic difference between different "case" and no form.Differences in form.Moreover,it can be found that the combination of different lattice structures of the same predicate can form Tibetan sentences centered on the predicate,that is to say,any Tibetan sentence can be decomposed into multiple lattice structures.Thus,it is possible to analyze the sentence structure of Tibetan sentences by case structure,and the order of eight cases in traditional Tibetan grammar can be described appropriately from the semantic relation between case structure and some grammatical problems left by traditional Tibetan grammar can be well explained.Third,the expansion of Tibetan grammatical unitsRegardless of the teaching of Tibetan language or the natural language processing of Tibetan language,the case structure should be described as an independent grammatical unit,and the Tibetan grammatical unit should be expanded into five types: words,phrases,case structure and sentences.Based on the case grammar framework,this thesis makes a thorough discussion on some problems encountered in Tibetan information processing in the fields of word segmentation,part of speech,phrase structure and sentence boundary.By using the concept of case structure,this thesis analyses the difference between function words and case auxiliary words,explains the nominalization of Tibetan address forms,and puts forward some solutions to the problem of Tibetan nested case structure,etc.Fourth,formal Description of Case Grammar in Tibetan LanguageIn this thesis,about 2500 Tibetan sentences are selected from literature,news,history,natural science,engineering technology and other fields,and each sentence is segmented,part-of-speech tagged,syntactic and semantic tagged to form a Tibetan case grammar tree.Statistical grammar tree in the various grammatical units and their structures,summed up the common Tibetan lexical rules,phrase rules,semantic classification,and formalized description of the Tibetan case grammar,and try to let the computer automatically analyze the structure of Tibetan sentences.In addition,the statistical calculation method of Tibetan language probability model based on case grammar is also derived.Fifth,the problems and prospects of "Tibetan case grammar"In the last chapter of this thesis,the author briefly introduces the problems and prospects of Tibetan individual grammar which have not been discussed in this thesis due to the limitation of time and energy.The problems of Tibetan grammar that have not been studied in this thesis include the structure and grammatical function of Tibetan "adverbs";the automatic recognition of the first case structure of Tibetan;the priority selection of multiple case structure;the grammatical vacancy in Tibetan compound sentences and the evaluation of Tibetan case grammar. |