Font Size: a A A

Web-based Study On New Words And Expressions

Posted on:2011-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q D ShengFull Text:PDF
GTID:2155360305973016Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Accessible to almost every corner of people's lives, Internet not only brings people great convenience but also increasingly influence our language, especially the words. More and more Chinese new words and expressions appear in the network, exerting great influence on people's daily communication. Their appearance enriches our language more while the identification of the newly-appeared expressions also brings new challenges to the adaptation of lexicon, dictionary compilation and natural language processing. It is a tough job to search out the new word quickly and accurately in terms that there is not a clear-cut and commonly-accepted definition for new word now. This thesis, based on their definitions in linguistics and lexical analysis, divides new words into three categories, i.e., named entity, the existed word or expression with new meaning or new usage and the words or expressions with new morphology. This thesis will focus on the automatic searching of newly-coined words.While the research on identification of the newly-coined word or expression is limited and the research is restricted by the length or the field, the current study proChares a new method, that is, to search newly-invented words and expressions in the webpage gathered from Internet which aims to search for the new words appearing after any given date without the limitation of length and field.Our implementation is comChared of three steps:webpage gathering, webpage analysis and new words and expressions extraction. The webpage analysis refers to the download of specified webpage with Web Crawler. The webpage analysis includes extraction of the date and the content from the webpage, word segmentation, research for the repeated strings and the saving of repeated strings into the original information database with dates. The extraction of new words and expressions covers the division of the original information database into backup database and filter database based on the given date, extraction of the new word-string candidate in the filter database and automatic filtering on the candidate set, after which the final results can be obtained.This study illustrates the searching for the repeated strings, the gathering of the content and the time of webpage and the extraction of new words, focusing on proCharing the algorithm of searching for the repeated string based on the existing methods. By exemplification, the application of the proChared algorithm can achieve fairly-good linear time complexity and space complexity.
Keywords/Search Tags:Chinese New Words And Expressions, Automatic Searching, Repeated String Searching
PDF Full Text Request
Related items