Font Size: a A A

Research On Text Classification Based On WeChat Subscription

Posted on:2017-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:X L TanFull Text:PDF
GTID:2348330488485685Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology, science and technology began to affect people’s lives in various fields. In modern society,people’s daily communication has been producing a huge amount of data since the popularity of the Internet, and there is a flood of data all the time. How to effectively combine the data and use these treasures, is very hot in information science. As an important branch of Natural Language Processing, the role of text classification in the management, use, and positioning of massive text information has been shown and applied to all aspects of people’s lives. As an important communication software to people and based on the huge user, the number of WeChat Subscription geometric growth, caused widespread concern. At this stage, the research on WeChat Subscription are more for the development of a WeChat Subscription System or the news media and other aspects of the impact of research.,what’s more, a small amount of exploration on the characteristics and business models of the public. To an enterprise, it can greatly enrich the construction of the user portrait of Tencent in the business field, which is of great commercial value. At the same time, the corpus information in WeChat Subscription is also a certain feature, which has a certain academic value for the study of text classification.In this paper, we have made two aspects explore for the classification of WeChat Subscription:(1) Each WeChat Subscription often corresponds to a specific categories, such as automotive, clothing etc., and as big categories,the automotive, clothing category is widespread such as SUV, luxury car, travel car, children’s clothing, women’s and men’s clothing and so on.The difficulty to build user interest portrait is to solve the problem that WeChat Subscription classified by reasonable categories. In this paper, build a two-tiered classifier based on Logistic regression algorithm that feature is account and description text. The Experimental results show that under the appropriate parameters, higher accuracy and recall rate and the F1 value.(2) For search a process to improve the classification efficiency, this paper proposes a enhance feature weight method based on LDA. The document-word probability is obtained according to probability formula. Then the experiment is executed under different subjects, and uses TF-IDF, document-word probability and LDA-TF-IDF combined with semantic as weight separately. The experiment results show that the weight combined with semantic has positive affection and outperforms the results based on TF-IDF, document-word probability under any threshold.
Keywords/Search Tags:Text classification, WeChat Subscription, Logistic Regression, LDA
PDF Full Text Request
Related items