Font Size: a A A

Research On Classification Method Of Fraud Phone Text Based On Deep Learning

Posted on:2024-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhouFull Text:PDF
GTID:2556307076976789Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet economy and communication technology has promoted social progress and brought great convenience to people’s lives,but the progress of science and technology has also enriched the means of telecom fraud.Telephone fraud as the most representative type of telecom fraud increasingly intensified in recent years.This kind of case is usually committed by the team,they use deception in a short time to obtain the victim’s trust,which is impossible to guard against.Fraud cases have become a hidden danger to society,which seriously affects the harmony and stability of society,and seriously endangers people’s lives and property safety.The governance of fraudulent telephones is urgent.In essence,the fraud phone text is a collection of sentences containing fraud terminology,which can be recorded in the form of text through voice conversion technology.This text contains fraud semantic information,which is contained in the sequence structure of the sentence,the local correlation between the words,the contextual correlation of the text,and keywords in the text characteristics.In the research of text classification based on deep learning at home and abroad,many classical algorithms have been developed for the identification of aggressive text and emotion analysis on social media.These algorithms are also suitable for fraudulent recognition of phone text.But compared to the above classification tasks,fraud texts are semantically more complex and relatively difficult to distinguish.Recurrent Neural Networks(RNN),variants of RNN,Convolutional Neural Networks(CNN),and mixed neural networks are used in most text classification studies.However,a single network or simple network combination is relatively unable to obtain a rich knowledge of the text characteristics of fraud phones.And choosing the appropriate text vector representation is helpful to enrich the fraud semantic information of the fraud phone text,which greatly influences the classification results of the model.To address the issues mentioned above,this thesis is based on the text classification method of deep learning and integrates various text knowledge to build a fraud phone text classification model.The main research contents of this thesis are as follows:(1)We made tens of thousands of fraud phone text data sets.The data from various Internet sites.And we manually wrote and modified part of the fraudulent text data sets.The data sets cover various types of fraud,such as education,brushing,impersonating public officials,and chatting to make friends.(2)In order to fully extract the features of fraud phone text and enable the model to fully learn fraud semantic knowledge,this thesis proposed four different fraud phone text classification models based on text classification methods and integration of various text knowledge.Model one is PEAGCNN(Position Embedding and Attention are introduced into Bi GRU and CNN),which uses sine and cosine functions of different frequencies to encode text position information and integrate it into word vectors.Then Bi GRU(Bidirectional Gated Recurrent Unit)and CNN(Convolutional Neural Network)are respectively used to extract text context-related information,sentence sequence,and local correlation.The Attention mechanism reassigns weight to the extracted information,highlights the role of key information,and finally integrates the two kinds of information.The second model is TE-Bi LSTM(TransformerEncoder-Bidirectional Long Short-Term Memory),which is a hybrid neural network based on the combination of an improved Transformer and Bi LSTM.Multi-head attention mechanisms can extract deep semantic information from the text in different subspaces,and Bi LSTM can take advantage of the distance dependence of text.The third model is LMHACL(Bi LSTM-Multi-Head Attention Mechanism Module with convolution-Bi LSTM),which combines Bi LSTM or Bi GRU with a Multi-Head attention mechanism module with convolution.Bi LSTM or Bi GRU is used to build the codec layer.MHAC(Multi-Head Attention Mechanism Module with Convolution)enhances the model’s ability to learn global interaction information and multigranularity local interaction information in fraud phone text.The fourth model is BERT_Bi LCNN(Bidirectional Encoder Representation from Transformers-Bi LSTM and CNN),and BERT represents fraud text in the word embedding part.At the same time,the hybrid neural network Bi LCNN(Bi LSTM and CNN)is used to learn time sequence knowledge and local interaction knowledge of the text.(3)To enhance the model’s performance even more,during the process of developing Model 4,this thesis conducted an experimental comparison between Word2 Vec and BERT and selected the word embedding vector with the most abundant fraud semantic features.The experimental results show that BERT as a word embedding model has the best experimental results.(4)A large number of experiments were carried out on the text data set of the fraud phone.And the classic text classification model was selected as the baseline model.The experimental results show that the experimental results of the four text classification models of fraud phone proposed in this thesis are all higher than the baseline model,and the accuracy and F1 value and other evaluation indexes are all above 90%,among which BERT_Bi LCNN has the best experimental results.
Keywords/Search Tags:fraud phone, text classification, BiLSTM, BiGRU, CNN, transformer, multi-head attention mechanism
PDF Full Text Request
Related items