Font Size: a A A

Research And Implementation Of Web Page Topic Classification Method Based On LSTM And Transfer Learning

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:E B M M T KuFull Text:PDF
GTID:2428330590454695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text classification research is relatively early,and has more mature technology,so web page classification research is mainly based on text classification.At present,most web page classification methods belong to shallow learning methods.Due to the special grammar of language,semantic pluralism,and implicit expression,shallow learning methods have limited text representation ability and rely on manual extraction features,which is difficult to obtain.The accuracy of the page classification.Therefore,this paper based on the deep learning method to carry out web page topic classification research.In the field of natural language processing,the deep learning model is highly targeted,and specific models must be trained for specific tasks.As we all know,the training and effects of the deep learning model depend on the scale of the training data.However,for some tasks with less training data,this will be a thorny problem,thus limiting the application of the deep learning model in the field of small sample text information processing.In addition,due to the complex structure of the deep learning model,even with sufficient training data,the cost of retraining the model for a specific task is higher.Aiming at the above problems,this thesis focuses on the classification of webpage texts,and deeply studies the topic classification technology of webpage texts.Combined with deep learning and migration learning,this paper proposes a fine-tuning method for general language models that can be used for Chinese and Uyghur text classification problems.The experiment proves that the fine-tuning method based on the common language model can effectively solve the topic classification task of Chinese and Uyghur web pages.The research work of this paper mainly has three points.:(1)Constructed a language modeling and web page topic classification data set.Chinese and Uighur texts were collected from news websites such as People's Daily and Tianshan.com using web crawler technology.A language modeling dataset and a webpage text topic dataset are constructed.(2)Use different parameter optimization method.In deep neural networks,the information represented by different layers is different,so different layers set different learning rates,which can prevent catastrophic forgetting and speed up the convergence of the model.(3)A web page topic classification method based on deep learning and migration learning is proposed.This method can solve the problems of lack of data and long training time of deep learning model.Compared with the training of target task data only,the classification accuracy of this method in Chinese and Uighur webpages has increased by 5.62% and 5.87%,respectively,which has a good classification effect.
Keywords/Search Tags:deep learning, transfer learning, Webpage topic classification, language model
PDF Full Text Request
Related items