Font Size: a A A

Research On Text Multi-Feature Classification Algorithm Based On BERT-LSTM

Posted on:2023-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GaoFull Text:PDF
GTID:2558306905486854Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is a key technology for massive information processing,including text preprocessing,text selection representation,feature selection and classification model algorithms.Traditional algorithms have problems such as sample imbalance leading to result bias,missing data,and ignoring data relevance.Deep learning models can automatically capture feature information,have strong representation capabilities,and can solve the problem of manual processing of features by traditional text classification algorithms,which has become a hot research topic for text classification.The BERT model can solve the problem of lack of semantic feature information in terms of word vectors and enrich vector features.Deep learning methods have two problems in the field of text classification:(1)the model is susceptible to data sets,resulting in the model underfitting or overfitting;(2)many factors affect the results of text classification,mainly text preprocessing,text selection representation model,feature dimensionality reduction algorithm,text classifier and other factors.In this paper,the text classification task is studied by combining deep learning methods.The main research content of this article is:(1)Aiming at the problem that the classification results of traditional classification tasks rely on word segmentation algorithm,this paper proposes a Whole Word Masking model based on BERT to solve the problem that traditional classification tasks rely on word segmentation algorithms.The model introduces attention mechanism,extracts the text sentence part of the dependent information before and after and gives different weights according to the importance of the information,enriches the text features,preprocesses the text through BERT-WWM and Attention,extracts some features from different aspects to generate text sentence vectors,and integrates part of the context information to improve the ability to capture the text context feature information.(2)For the text to rely on the overall information,including the word meaning expression,potential meaning,language polarity,etc.of the word need to be expressed according to the characteristics of the context information,Word2 Vec and other word vectors,can not capture the overall information of the text,the lack of complete feature dependence and other issues.Therefore,an algorithm for combining dictionary matching potential words with Tree-LSTM into semantic part-of-speech information is proposed to deal with word features and part-of-speech problems,and Bi-LSTM is introduced to fully obtain text context-dependent feature information,realize the accurate expression of text features,and extract keyword multi-feature information.(3)In this paper,the BERT language model is combined with different network structures,the test is trained on multiple data sets,and the performance of the proposed model is analyzed by comparing the evaluation indicators such as accuracy rate,recall rate,F1 value and other benchmark models.This paper proposes the BERT-LSTM model,and experimental results show that the correctness of the model and the performance of the model are significantly improved.
Keywords/Search Tags:Text classification, Deep learning, Attention, BERT, Multi-feature extraction
PDF Full Text Request
Related items