Research On Context Feature Recognition And Representation Learning For Aspect Extraction

Posted on:2023-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Pan

Full Text:PDF

GTID:2558306629474634

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Aspect extraction is an essential task in the field of natural language processing,aiming to automatically extract the expressive language of aspects in text,which has significant application value in sentiment analysis and opinion mining.In recent years,neural networkbased aspect extraction has made some progress in research.However,the task still has three challenges,including difficulty in recognizing low-frequency aspects,confusion between aspects and common words,and insufficiency of observable samples.To address the above problems,this paper proposes the following three research components.Firstly,according to the distribution frequency of aspects,this paper divides aspects into high-frequency aspects and low-frequency aspects.High-frequency aspects have strong domain representation ability and can be easily perceived by supervised learning models.In contrast,low-frequency aspects occur less frequently.There are fewer samples available for training,making it difficult for the neural network models to sufficiently learn the corresponding context features,which leads to difficulty in recognition.This paper finds that low-frequency aspects often appear alongside high-frequency aspects,that is,they co-occur in local text fragments.Under such co-occurrence,the high-frequency aspects can serve as the key clues to characterize the semantics of the low-frequency aspects.Accordingly,this paper proposes an aspect extraction method that incorporates high-frequency aspects information.The method utilizes statistical information to mine high-frequency aspects in the sentence.Then,this paper incorporates them into the representation learning process as significant context features,thereby enhancing the distributed semantic representations of other words and assisting the recognition of low-frequency aspects in similar context.Secondly,the existing neural network-based models for aspect extraction can perform deep semantic representation and perception of aspects with their contexts.However,the current techniques still have difficulty in recognizing high distinguishable context features.This problem leads to such models easily confusing aspects with other common words,resulting in a high rate of aspect misclassification and omission.Therefore,this paper proposes an aspect extraction method that combines data self-augmentation with contrastive learning.This method utilizes Regularized Dropout(R-Drop for short)to implement the data selfaugmentation,thereby expanding the learnable samples of positive and negative examples.On this basis,this paper optimizes the feature representations of positive and negative examples with the help of contrastive learning,which provides a more perceptible differentiated representation pattern for both of them.Through the above method,this paper guides the model to automatically recognize high distinguishable context features and improve its ability to distinguish aspects from common words.Finally,the aspect extraction task is strongly domain-specific.There are differences among texts from different domains in terms of pragmatic and expressive.Therefore,the neural aspect extraction models trained in a single domain often have difficulty in obtaining good performance for data processing in other domains.In practical applications,the spacetime consumption of labeling a large amount of observable data in each domain for supervised learning is too large.Accordingly,this paper proposes a method based on the target paradigm(i.e.,Prompt)for aspect extraction.This method constructs a target prompt containing a mask for the original text,and the prompt is a paradigm that characterizes whether a certain aspect exists or not.On this basis,this paper predicts the mask information of the target prompt according to the context feature representation with the decoder of the Masked Language Model,thus indirectly assisting in discrimination of the existence of an aspect.For the three problems of difficulty in recognizing low-frequency aspects,confusion between aspects and common words,and insufficiency of observable samples,this paper proposes three methods,including aspect extraction by integrating high-frequency aspects information,aspect extraction by combining self-augmentation of data and contrastive learning,and aspect extraction based on target paradigm mask prediction.This paper conducts experiments on four aspect extraction datasets provided by International Workshop on Semantic Evaluation.The results show that the methods in this paper achieve an 83.94%F1 score on the laptop domain dataset in 2014,and obtain 88.72%,73.61%,and 78.10%F1 scores on the restaurant domain datasets from 2014 to 2016,respectively.

Keywords/Search Tags:

Aspect Extraction, Context Feature Recognition, Data Self-augmentation, Contrastive Learning, Few-shot Learning

PDF Full Text Request

Related items

1	Generalized Zero-shot Learning Based On Contrastive Learning And Semantic Augmentation
2	Research And Application Of Knowledge Extraction Based On Few-shot Learning
3	Research On Few-Shot Learning For Image Recognition
4	Research And Implementation Of Few-shot Recognition Algorithm Based On Metric Learning And Data Augmentation
5	Research Of Object Detection Based On Few-Shot Learning
6	Research On Compositional Zero-Shot Recognition Method Based On Visual And Semantic Embedding
7	Research On Hypernymy Recognition Based On Graph Contrastive Learning
8	Delving Into Contrastive Learning For Unsupervised Visual Representations
9	Research On Few-shot Learning Based Via Data Augmentation
10	Research On Sound Signal Recognition Algorithm Based On Few Shot Learning