| Automatic recognition technology of separable words has important applications in many fields such as Chinese information processing, Chinese-English translation etc.Scholars found that in the process of the continuous study of separable words, it must be treated separately in order to meet the operational needs because of the numerous types and various forms. While, artificial identification mark has quite high accuracy in the precision and recall, but the identification efficiency cannot meet the requirement of machine processing. Therefore, raising the recognition efficiency of extended form has become necessary requirement in modern Chinese. The main contents are follows:(1) Base on the large modern Chinese corpus established by Beijing Language and Culture University, we statistics and analysis the usage of extended form of separable words through this corpus, aiming at the different extension types and using frequency, we mainly select the higher discrete frequency of verb-object separable words as the object of the study.(2) By analyzing the characteristics of different separable words in actual corpus and study the manifestations of extended compositions in the large-scale corpus, we give the summary and detailed classification of the extended components, and expand the regularity knowledge of separable words from the angle of computer recognition.(3) We get the regular knowledge of extended form based on the above research and combine with its own characteristics, a new automatic identification method of separable words is designed based on the words segmentation and the structure type of extended component. Through the open test, we get the high recall and precision. The experimental results show that the automatic identification method is effective. |