| As a branch of computational linguistics, Chinese information processing is becoming more and more important in the rapid development of Internet technology,such as artificial intelligence, search engine and so on. The application scope of Chinese language has gradually expanded with China’s influence in the world,therefore, as an important entity of Chinese language, the compound sentence has become the core object of computer processing, and it is one of the difficulties in Chinese information processing.At present, the study of compound sentences mainly includes the automatic identification of relation markers, the judgement of clauses and non-clauses, the automatic classification of sentences’ layer and the relation identification of compound sentences. Among them, there has been a lot of research on the automatic identification of relation markers and the division of clauses, but the research on the automatic classification and identification of compound sentences is less.In view of the automatic identification technology of relation markers has been mature basically, and the relation markers have the function of marking the hierarchical structure and the logic semantics between clauses in the compound sentences, therefore,when analyze the hierarchy of the compound sentence should depend on this important form of relation markers. However, due to the diversity of Chinese expressions, the relation markers will always be concealed in the clauses, that is, the concealed or revealed markers, which leads to the difficulty of the realization of the identification of the compound sentences by using the relation markers. Therefore, this paper take the strategy of division and rule, divide the objectives (a sentence of three clauses) into two types which are saturated and non-saturated; at the same time, in order to solve the problem of concealed markers, this paper constructs a table of marker coordination type and the concealing or revealing rules of relation markers, then realizes the automatic extraction of the concealing or revealing mode of relation markers in the compound sentence; In addition, on the basis of syntactic dependency analysis, a method that using syntactic repetition of calculating the correlation degree between clauses is proposed. Finally, the paper constructs the model of the sentence structure identification based on the concealing or revealing rules of relation markers and the associated features to achieve the purpose of the automatic classification of the sentence structure.The work of this paper is carried out from the following aspects: first of all, it proposes the method of clause division based on dependency syntax and punctuation;secondly, after eliminating pseudo clause, mark and extract the quasi relation markers,and obtains the words sequence; then, construct the table of marker coordination type and proposes the algorithm to obtain the concealing or revealing mode of relation markers; at the same time, calculate the value of semantic association between clauses based on the syntactic repetition; finally, construct the identification model of compound hierarchical structure based on concealing or revealing rules of relation markers and associated features, judge the sentence which is saturated by rules, and for the non-saturated we choose associated features. Through the experiment, the correct rate of the concealing or revealing mode of relation markers was 91.5%, and the correct rate of the sentence hierarchical structure was up to 90.6%. The results show that the method proposed in this paper is effective for the analysis of compound sentence hierarchical structure. |