Font Size: a A A

Research On Short Text Classification Algorithm Based On Fuzzy Logic

Posted on:2023-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:L B CaoFull Text:PDF
GTID:2568306836464224Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
At present,a large amount of short text data is continuously generated on the Internet,and classifying the data is an important step in processing these large amounts of text data.Therefore,how to improve the accuracy of the data classification results and realize the interpretability of the text classification process is a hot topic.However,most of the current short text classification algorithms have the following problems:First,the short text content contains a lot of uncertain information,and the existing classification algorithms assume that the extracted feature information is independent of each other,which ignores interrelationships among features.Second,short texts have problems such as small vocabulary,sparse text features,and irregular content,which are difficult for existing algorithms to deal with.Aiming at these problems,this paper proposes a new fuzzy aggregation operator based on Power Average operator and Maclaurin symmetric mean operator under the framework of Dempster-Shafer Theory.On this basis,a short text classification model is designed,which combines the fuzzy feature extraction method based on similarity score.The main contents of this article are as follows:(1)A multi-criteria decision-making method based onDSToperator is designed.First,the operation rules of hesitant fuzzy sets under Dempster-Shafer Theory are introduced.Then,according to these operation rules,the Power Average operator and the Maclaurin symmetric mean operator are combined,and the hesitant fuzzy power Maclaurin symmetric mean operator(DST)and its weighted form(DST)are proposed.The proposed operator conforms to the mathematical operation logic,which can capture the correlation between features while eliminating the influence of the extreme value of the feature.Finally,a multi-criteria decision-making(MCDM)method based on this operator is presented,and the feasibility and advantages of the method are verified by six groups of experimental examples,qualitative experiments and quantitative experiments.(2)A method for short text feature extraction based on similarity scores is proposed.This method is based on human natural language and thinking,and calculates the similarity score according to the probability of occurrence of words in the short text training set and the degree of relevance of the category to which they belong.Further,the similarity score is used to calculate four corresponding features,thereby constructing a multi-dimensional feature.matrix.The feature matrix is a set of fuzzy features extracted from short texts.It is not based on content expansion and feature expansion.Each extracted feature is calculated from the short text information,so the feature extraction method is interpretable.(3)A short text classification algorithm based on fuzzy logic is designed.In this part,fuzzy logic is introduced into short text entities,and a new short text classification scheme is proposed.The main process of the short text classification scheme is as follows:first,the training set is processed to form a keyword matrix,then the relevant features of the test set are extracted according to the keyword matrix and the feature extraction method based on similarity scores,and a multi-dimensional feature matrix is constructed.The fuzzy operator is used to aggregate the feature matrix to obtain the final scores of multiple categories.The effectiveness and accuracy of the proposed algorithm and the high degree of usability in engineering are verified through comparative experiments(classification of large-scale Chinese news datasets)and practical application cases(classification of rare disaster events).
Keywords/Search Tags:fuzzy set, multi-criteria decision-making, fuzzy operator, feature extraction, short text classification
PDF Full Text Request
Related items