Research On Text Representation And Classification Algorithm Based On Model Integration

Posted on:2023-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Du

Full Text:PDF

GTID:2557306833487134

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Within the swift developments of China’s education,online reports of educational information present the characteristics of large number and long content.Therefore,text classification technology has also been greatly impacted to some extent,and it can hardly be classified and sorted by the traditional way.In order to enable the relevant personnel concerned about education to browse the education news at a specific stage according to their needs,this paper benchmarks the text of educational news reports with labels Then,a text classification method is designed suitable for education news chief based on the analysis of the existing text classification algorithm.This paper mainly discuss from two aspects.(1)Aiming at the problems of information loss and high dimension in long text classification,this paper proposes lDA-D2V text representation method.Firstly,the topic distribution obtained through LDA training is mapped to the Doc2vec model to obtain a new topic vector.Then,Doc2vec model is used to train the document to obtain the document vector.Finally,the cosine similarity is used to measure the distance between the new topic vector and the document vector,and it is also used for text representation.While retaining the advantages of LDA model,the algorithm adds semantic information of text,so that the given vector can represent the text more completely.(2)This paper studies the text classification algorithm of CNN-BiLSTM network combined with attention model,in order to solve the problems that convolutional neural network(CNN)cannot use the context information of text in text classification,and circular neural network(RNN)cannot solve the problem of long-term dependence.The classification model combines the advantages of the two models.Firstly,CNN is used to extract the local features of the text information.Then,BiLSTM is used to extract the contextual information of the text,so as to extract the global feature information of the text.Finally,an attention layer is added to the end of the model in order to extract effective features from the model.The fusion model not only solves the problem that different words in the text have different effects on the classification results,but also improves the efficiency of classification.(3)The LDA-D2V text representation method and the CNN-BiLSTM-ATT classification model are compared in the online education news text collection.The experimental results show that the two models studied in this paper have better effect on the classification of educational news text sets,compared with the traditional models commonly used at present.

Keywords/Search Tags:

Education News category, the LDA model, Doc2vec, CNN, BiLSTM

PDF Full Text Request

Related items

1	Research On Confused State Recognition Based On BiLSTM-SVM Model
2	The Revision About The Categories Of Moral Education
3	Research On The Construction Of High School Mathematics Knowledge Graph Based On Deep Learning
4	A Study Of English Vocabulary Teaching In The Junior Middle School From The Perspective Of Category Levels
5	Public Opinion Analysis Algorithm Based On CNN-BILSTM Network And BERT
6	Theoretical Research On The Representation System Of Category Map
7	Use Of Chinese Mobile News Applications Among University Students In Shanghai:An Integrated Model Of U&G Theory And TAM
8	Mentally Retarded Children's Study On The Concept Of Contact Type Classification Ability
9	Based On BERT-BiLSTM-CNN Multi-feature Fusion Research And Application In Public Opinion Analysis Of Three-child Policy
10	Research On The Influence Mechanism Of Virtual Sports Group Participation Based On Latent Category Analysis Model