Similarity Calculation Of Short Text Based On Attention Mechanism

Posted on:2021-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z Yu

Full Text:PDF

GTID:2427330611499280

Subject:Applied statistics

Abstract/Summary:

Text similarity is the degree of semantic similarity between texts.It's used to evaluate whether different texts express the same semantic information.Text similarity calculation has a wide range of applications in the fields of intelligent customer service,search engines,and recommendation systems.Text similarity calculation has a long research history.The original method was based on the statistical information of the text.However,early methods could not accurately understand the semantic information of the text.Subsequently,people proposed to use deep learning methods to calculate similarity and achieved good performance.At present,most of the relevant references that use deep learning methods to calculate text similarity are based on English texts.The references that do Chinese text similarity calculations are relatively scarce.This paper designs a model of Bi GRU + Attention mechanism,which is used to calculate the similarity of Chinese short text.The main framework of the model is Encoder-Decoder.The Bi GRU model as the basic model at both ends of the framework can solve the problem of long-sequence dependence,and can well capture bidirectional semantic information.Add the attention mechanism to the model to improve the accuracy of text similarity tasks.In this model,the Batch Normalization layer and the Dropout layer will be added appropriately to improve the model convergence ability and prevent overfitting.Some methods will be taken to enhance the attention to boost the inference ability of the Decoder layerIn this paper,a total of four data sets are used and four experiments are conducted.The first three experiments are based on the ATEC competition data set.Due to the uneven distribution of positive and negative samples,this experiment uses the F1 Score to compare the generalization ability of different models.In order to eliminate the effect of sample imbalance as much as possible,the weights of positive and negative samples are adjusted in the loss function.The experimental results show that after the sample weights are adjusted,the model generalization ability is greatly improved.Meanwhile,the model generalization effect is not much different at the word level and character level.The fourth experiment was carried out on the remaining three datasets.This article compares the designed model with commonly used models in the industry and lists the time required for model training.Experimental results show that the model in this paper can converge faster than other models.

Keywords/Search Tags:

Related items

1	Research On Chinese Automatic Scoring Technology Based On Short Text Similarity And Smoothness
2	Evaluation Of University Patient Quality Based On BiGRU Attention
3	Research And Implementation Of Text Summarization Based On Attention Mechanism
4	A Popular Science Text Classification Algorithm Based On Attention Mechanism And Knowledge Graph Enhancement
5	Research And Implementation On Chinese Handwritten Text Of Examinees Recognition Methods Based On Deep Learning
6	Research On Text Sentiment Analysis Based On BERT-BIGRU Model
7	Research And Implementation Of Automatic Scoring System For Subjective Questions Based On Text Similarity Calculation
8	Research On Emotional Tendency Of Online Course Review Text For MOOC
9	Research On Normalization Measurement Of Broadcast Operations Based On Graph Convolution Neural Network
10	Job Salary Forecast Based On Text Similarity And Collaborative Filtering