Font Size: a A A

Short Text Analysis Based On Multi-Granularity Sequential Attention Mechanism

Posted on:2020-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2428330590971691Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,massive amounts of data are updated constantly and the types of data are different.Text is one of the most direct and common forms of organization.Mining the effective information on the text can more quickly and accurately understand people's opinions and emotional tendencies.It is beneficial to understand the market dynamics,social public opinion and other situations.With the maturity of technology,people expect that machines can think like human beings and filter redundant information in many texts.And these machines can organize and present the core content reasonably.With the gradual acceleration of the pace of life,more and more "fast food" information such as newsletters and comments are pouring into life.So how to get valuable information from large amounts of short text information is worth considering.Therefore,this thesis proposes a short text analysis method based on multi-granularity sequence attention mechanism,which simulates human cognitive process and combines multi-granularity idea.The main research of work is as follows:1.A convolutional neural network based on sequential attention mechanism is proposed to detect abnormal URLs.This model is mainly aimed at the detection of abnormal traffic in the network.The uniform resource locator is called URL for short.It is an identifier consisting of series of characters that makes a request to Server for resources.The URL has certain semantic information and consists of relatively few characters,which is a kind of short text.To solve this problem,a detection model consisting of five layers of network is proposed.First,the URL is coded by word embedding layer using word2 vec.Then the feature is self-learned by convolution layer.At the same time,an external language model is added to help the malicious code area to give higher attention value.Finally,the final detection results are obtained through maximum pooling layer and full connection layer.Verified on real URL datasets,the model can not only effectively detect whether the URL is an exception determination type,but also locate malicious code regions.2.The method of sentiment analysis and sentiment word detection based on attention mechanism is proposed.The model analyses the sentiment tendency of comments from different granularity.Combining with attention mechanism,the model uses convolutional neural network to extract features and learn fine-grained information,and then uses the output of convolutional neural network as the input of recurrent neural network.Using the advantage of recurrent neural network,we can learn the valuable information in the text from the coarse granularity.The attention mechanism of the model adaptively calculates the weight of the word in different contexts and focuses on the emotional words.The model can not only get the emotional tendency of the comment,but also calculate the emotional polarity of the words adaptively according to the context,so as to locate the position of the emotional words segment in the comment,which has higher accuracy and better performance.This model can avoid the problem that predefined affective dictionaries can not adapt to different contexts.At the same time it avoids the restriction of different languages.
Keywords/Search Tags:Text Classification, Deep Learning, Attention Mechanism, Multi Granularity
PDF Full Text Request
Related items