| Hedges indicate uncertainty, commonly used to moderate speaker’s tone or mitigate speaker’s responsibility for statements. Hedge information guided by hedge cues should be distinguished from certain information. Therefore, detecting hedge information is important for information extraction. Researches on English hedge information detection have made great progress, while researches on Chinese hedge cue identification are in their infancy. And there is no related corpus available. This paper constructs a corpus for the research of Chinese hedge information detection and studies the identification of cross-domain Chinese hedge cues.For the lack of Chinese hedge information corpus, this paper constructs a Chinese hedge corpus in biomedical and wikipedia domains, which contains 24,000 sentences. We study on the classification of Chinese hedge cues, and develop hedge annotate rules. Based on the phrase structure tree of a sentence, annotation rules for scope are developed according to the types and part of speech of hedge cues. We calculate the tagging consistency rate of hedge cues and their scopes. The statistics show the high tagging consistency rate because of the detail annotation rules. Meanwhile, relationships between the types of hedge cues and domains illustrate the domain specific of Chinese hedge cues.Chinese hedge cues are widely used in the biomedical literature, wikipedia and other domains. The difference of hedge cue distributions in various domains makes the domain-specific detectors difficult to extend to other domains. This paper proposes a cross-domain Chinese hedge cue identification method by combining instance-based transfer leaning and feature-based transfer learning. Our approach combines the complementary strengths of the two transfer learning methods. Experiments on the domains of the biomedical literature and wikipedia show that our combination method outperforms sole instance-based transfer leaning method and sole feature-base transfer learning method. Moreover, word embeddings could capture the semantic information of the words. We propose a cross-domain Chinese hedge cue identification method by combining word embeddings and transfer learning. Word embeddings of the hedge cue candidates are used as features for instance-based transfer leaning and feature-base transfer learning. Experimental results show that introducing word embedding features could improve the performance of both the two transfer learning methods. Furthermore, by combing the results of the instance-based transfer leaning and the feature-base transfer learning, the F-score achieves72.39%.The Chinese hedge information corpus provides resources for the research of Chinese hedge information detection. Our method could be used to expand the Chinese hedge cue identification to many other domains, which is important for Chinese factual information extraction. |