Research On Classification Method Of Fraud Phone Text Based On Deep Learning

Posted on:2024-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:J J Zhou

Full Text:PDF

GTID:2556307076976789

Subject:Control Science and Engineering

Abstract/Summary:

The rapid development of the Internet economy and communication technology has promoted social progress and brought great convenience to people’s lives,but the progress of science and technology has also enriched the means of telecom fraud.Telephone fraud as the most representative type of telecom fraud increasingly intensified in recent years.This kind of case is usually committed by the team,they use deception in a short time to obtain the victim’s trust,which is impossible to guard against.Fraud cases have become a hidden danger to society,which seriously affects the harmony and stability of society,and seriously endangers people’s lives and property safety.The governance of fraudulent telephones is urgent.In essence,the fraud phone text is a collection of sentences containing fraud terminology,which can be recorded in the form of text through voice conversion technology.This text contains fraud semantic information,which is contained in the sequence structure of the sentence,the local correlation between the words,the contextual correlation of the text,and keywords in the text characteristics.In the research of text classification based on deep learning at home and abroad,many classical algorithms have been developed for the identification of aggressive text and emotion analysis on social media.These algorithms are also suitable for fraudulent recognition of phone text.But compared to the above classification tasks,fraud texts are semantically more complex and relatively difficult to distinguish.Recurrent Neural Networks(RNN),variants of RNN,Convolutional Neural Networks(CNN),and mixed neural networks are used in most text classification studies.However,a single network or simple network combination is relatively unable to obtain a rich knowledge of the text characteristics of fraud phones.And choosing the appropriate text vector representation is helpful to enrich the fraud semantic information of the fraud phone text,which greatly influences the classification results of the model.To address the issues mentioned above,this thesis is based on the text classification method of deep learning and integrates various text knowledge to build a fraud phone text classification model.The main research contents of this thesis are as follows:(1)We made tens of thousands of fraud phone text data sets.The data from various Internet sites.And we manually wrote and modified part of the fraudulent text data sets.The data sets cover various types of fraud,such as education,brushing,impersonating public officials,and chatting to make friends.(2)In order to fully extract the features of fraud phone text and enable the model to fully learn fraud semantic knowledge,this thesis proposed four different fraud phone text classification models based on text classification methods and integration of various text knowledge.Model one is PEAGCNN(Position Embedding and Attention are introduced into Bi GRU and CNN),which uses sine and cosine functions of different frequencies to encode text position information and integrate it into word vectors.Then Bi GRU(Bidirectional Gated Recurrent Unit)and CNN(Convolutional Neural Network)are respectively used to extract text context-related information,sentence sequence,and local correlation.The Attention mechanism reassigns weight to the extracted information,highlights the role of key information,and finally integrates the two kinds of information.The second model is TE-Bi LSTM(TransformerEncoder-Bidirectional Long Short-Term Memory),which is a hybrid neural network based on the combination of an improved Transformer and Bi LSTM.Multi-head attention mechanisms can extract deep semantic information from the text in different subspaces,and Bi LSTM can take advantage of the distance dependence of text.The third model is LMHACL(Bi LSTM-Multi-Head Attention Mechanism Module with convolution-Bi LSTM),which combines Bi LSTM or Bi GRU with a Multi-Head attention mechanism module with convolution.Bi LSTM or Bi GRU is used to build the codec layer.MHAC(Multi-Head Attention Mechanism Module with Convolution)enhances the model’s ability to learn global interaction information and multigranularity local interaction information in fraud phone text.The fourth model is BERT_Bi LCNN(Bidirectional Encoder Representation from Transformers-Bi LSTM and CNN),and BERT represents fraud text in the word embedding part.At the same time,the hybrid neural network Bi LCNN(Bi LSTM and CNN)is used to learn time sequence knowledge and local interaction knowledge of the text.(3)To enhance the model’s performance even more,during the process of developing Model 4,this thesis conducted an experimental comparison between Word2 Vec and BERT and selected the word embedding vector with the most abundant fraud semantic features.The experimental results show that BERT as a word embedding model has the best experimental results.(4)A large number of experiments were carried out on the text data set of the fraud phone.And the classic text classification model was selected as the baseline model.The experimental results show that the experimental results of the four text classification models of fraud phone proposed in this thesis are all higher than the baseline model,and the accuracy and F1 value and other evaluation indexes are all above 90%,among which BERT_Bi LCNN has the best experimental results.

Keywords/Search Tags:

fraud phone, text classification, BiLSTM, BiGRU, CNN, transformer, multi-head attention mechanism

Related items

1	Research And Application Of Fraud Phone Recognition Method Based On NLP
2	Research On Text Classification Of Government-people Interactive Messages Based On Deep Learning
3	Application Of BiGRU Model Based On Attention In Judicial Trials
4	Research On Multi-Label Judicial Text Classification Algorithm Based On Attention Mechanism
5	Research On Multi Label Charge Prediction Method Based On BERT-BiGRU
6	Research Of Entity Relationship Extraction Of Legal Text Based On Deep Learning
7	Research And Implementation Of Multi-label Classification In Judicial Field
8	Research On Precision Classification Technology Of Appeal Short Text
9	Research On Multi-class Standard Text Classification Algorithm For Identifying Key Legal Factors Of Judicial Judgment Documents
10	Research And Implementation Of Case Analysis Method Based On Multi Task Learning