Font Size: a A A

New Chinese Words Assisted Identification System Developed

Posted on:2004-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:B LuoFull Text:PDF
GTID:2205360122971963Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
There are two basic methods in automatic recognition of unregistered words: statistic-based and linguistic rule-based. Linguists used to interpret the rules of word formation from the perspective of impression, which is hard to offer formalized conclusions, so it is quite difficult to gain computer application. This paper tries to describe the rules of word-building in a relatively quantitative way and makes the conclusion more computational applicable.This paper introduces the development on "Computer-aided Unregistered Words Identification System in Contemporary Chinese", and gives a particular description of the system, including its structrue, algorithm and process. Also, it analyses the recall rate and precision rate of the test result.In our developing process of the system , we combine statistic-based and linguistic rule-based to enable computers to extract possible unregistered words from large running-texts automatically, thus providing modern Chinese dictionary editors with a wait-and-see unregistered word list to support their work on new edition of the dictionary. It will give a sheet with unregistered words to be identified manually by modern Chinese dictionary editors. Also, this system can be used to identify unregistered words in Chinese information processing.Another characteristic of this system is that we based our running text on the People's Daily (electronic edition), which contains about 70,000,000 Chinese characters, and the test results are reasonable.
Keywords/Search Tags:word-building, Unregistered words, Chinese information processing.
PDF Full Text Request
Related items