Font Size: a A A

Research On Media Catchwords Extraction

Posted on:2010-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:B Z WuFull Text:PDF
GTID:2155360275479543Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Media catchwords refer to the words which spread quickly over newspaper, television, broadcast, network and other Medias. Annual media catchwords can truly reflect and highly summarize the year's domestic policy initiatives, major events in society, people focuses, as well as the changes in the international situation. The changes in media catchwords in different years reflect the social dynamic changes and the changes of the mass psychology. Research on the extraction of media catchwords to popular dynamic languages is an important part of language monitoring which has important academic value and far-reaching social and cultural significance.This thesis investigates on the method of media catchwords extraction which includes the following aspects:Firstly, this thesis investigates the characteristics. And eventually sets up a model which presents the popular features of media catchwords.Secondly, this thesis proposes a grading model to determinate media catchwords. This model introduces the word attributes such as the commonly used attribute, the time attribute and the vicissitude attribute, and then it quantifies these attributes to obtain the media catchwords grading formula. This model can screen out the media catchwords from the candidate collection according to their grading scores and the development curves.Thirdly, this thesis designs and implements a system for extracting the annual media catchwords. This system uses the WebPages of 2007 (approximately 10,642 MB) as the resource to research, which were downloaded from five popular portal web sites (NetEase, Sohu, Sina, QQ, Tom). And we use the method of Omni-segmentation to obtain all the strings of words after it's pretreatment. Then the system gets the candidate collection through carrying on filtration processing. Finally, this thesis uses its unique popular score model to select the annual media catchwords from the popular candidate word collection.Fourthly, in the process of obtaining candidate collection, this thesis proposes a new media catchword extraction method based on the statistics and rules. It gets all possible strings of words using the Omni-segmentation, and then it uses the linguistic knowledge-based filtering rules, garbage string-based filtering rules and TF-IDF weight filtering rules to filter all the word strings to obtain the popular candidate word collection.An annual media catchwords automatic extraction system, which is designed by the model proposed in this thesis, has been applied in the Internet Media Language Branch of National Language Resources Monitoring and Research Center to promulgate the media catchwords of 2007. The good result well proves the rationality of the characteristic model of media catchwords and the feasibility of the automatic extraction system. This automatic extraction system can provide objective, real and high-quality candidate catchwords, and can save massive manpower and resources.
Keywords/Search Tags:Catchwords, Media Catchwords, Annual Media Catchwords, Term Extraction
PDF Full Text Request
Related items