Research On Text Similarity And Text Association Analysis Based On Multi-granularity Information

Posted on:2023-11-04

Degree:Master

Type:Thesis

Country:China

Candidate:J W Shi

Full Text:PDF

GTID:2558307070983319

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Text similarity and text association analysis are two important branches of the natural language processing(NLP)tasks,which play core supporting roles in many practical applications.With the successful application of deep learning in the field of NLP,deep neural network models have been applied to the tasks of text similarity and text association analysis to reduce the cost of manual feature engineering.On the one hand,most models are designed for English text in the previous research on text similarity analysis.The common approach to applying these models to Chinese is taking Chinese characters directly or segmenting each sentence into words with an existing Chinese words segment(CWS)system.However,the CWS systems have the problems of word segmentation error and semantic segmentation,which result in the challenges of effective features construction and semantic understanding.On the other hand,in the research of text association analysis,the text associated clauses need to be labeled in datasets in advance,which limits the application of the model in the actual scenes.Based on the above two points,this thesis studies the two directions of designing effective Chinese text representations to solve the limitations of the CWS systems,and optimizing text association analysis algorithms to solve the dependency of dataset annotation.The innovations and contributions of this work are summarized as follows:(1)This thesis proposes a Chinese text representation method based on multi-granularity information(Hyperlexicon),which can extract the complete vocabulary information in the text.At the same time,this thesis designs three fusion methods to integrate text multi granularity information for model training.Based on the Hyper Lexicon,the character-word twostream network(CL2N)is designed.The network can extract singlesentence features and interactive features to improve the performance of text similarity analysis.(2)This thesis studies the emotional cause analysis task,which is the main task of text association analysis.In this study,the text association analysis task is divided into two sub-tasks: 1)extraction of associated elements(extraction of emotion clauses and reason clauses);2)combination and filtering of associated elements.In the first sub-task,a mutual assistance single-task model(MASTM)based on multi-granularity information is proposed to extract the associated elements.In the second sub-task,the Cartesian product is used to combine the related elements,and the relative position information of the associated clauses is added to assist the filtering.Then,three filters with different granularity are designed to calculate the correct groups of associated clauses.Furthermore,this study uses several public datasets and frontier models to conduct comparative experiments on the tasks of text similarity analysis and text association analysis,and analyzes the experimental results from multiple dimensions to verify the effectiveness of the method proposed in this thesis.The experimental results show that,CL2 N has better performance than the existing short text matching models in text similarity analysis task.This method can solve the problem of error propagation of Chinese word segmentation.The combination of single-sentence features and interactive features allows the network to capture contextual semantic information and vocabulary information of common concern between sentences,which is helpful to obtain the best results of the model.The results also show that,in the text association analysis task,the method in this thesis can extract the emotion clause and emotional cause clause with relevance at the same time,solve the problem of dataset pre annotation,and achieve better accuracy in recognition and extraction.Compared with the multi-task learning model,the F1 score of the MASTM is increased by 5.3%.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Study On Chinese Text Similarity Computing Based On Word Segmentation
2	Study On Method To Automatically Analyze The Text Structure Based On The Relevancy Computing Of Text Content
3	Research On Chinese Text Similarity Detection Technology Based On Word Weight Analysis
4	Multi-Granularity Sentiment Analysis Based On Text
5	Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics
6	Design And Implementation Of DNA Word Segmentation And Semantic Analysis System
7	Attribute Reduction Algorithms, And Text Similarity Computing In The Intelligent Analysis System
8	Research And Implementation Of Text Mining Technology Based On Public Security Information
9	Research On Text Similarity Measure Method Of Combining New Word Analysis And Semantic Analysis
10	Research On Text Similarity Algorithm Based On WMD Distance