| In recent years,China’s investment in scientific research has been increasing.The number and scale of scientific research projects have been significantly improved.The duplication and novelty search of scientific and technological projects has become an urgent problem to be solved in project management.Aiming at the problems of poor performance of traditional text similarity in entity recognition and difficult understanding of deep semantics,this thesis proposes a multiple headed graph attention entity relation joint extraction model based on syntactic dependency graph and a document similarity matching method fusing entity hierarchy types,and designs and implements a document similarity matching system.The main work of this thesis is as follows:(1)Joint entity and relation extraction based on syntactic parsing graph and multihead graph attentionTo enhance the semantic understanding of text,the semantic information described is extracted from text.Firstly,the sentences in the text are preprocessed and expressed as word vector matrix,and then the syntactic dependency graph is obtained according to syntactic analysis.Then aiming at the problem of insufficient semantic understanding,the model uses Bi LSTM network and graph convolution network for feature extraction,and combines the sequence characteristics and structural characteristics of sentences,so that the model can fully understand the semantic information.Finally,the multi-headed attention layer is constructed and the corresponding features of different relationships are fused to improve the relation learning ability of the model.Experimental results show that the proposed model has better extraction effect on entity relations than graph convolutional network and graph attention network.(2)Document similarity matching for fusing hierarchy typesThe similarity of documents is judged according to the entity relation information extracted from documents.Hierarchical type information can often make the representation of entities more refined and accurate.Combined with the idea of word move’s distance,a fusing hierarchical type similarity calculation method based on word move’s distance is proposed.The entities are fused into hierarchical type representation,and the distance between documents is calculated by moving entities,so as to realize the calculation of document similarity.In addition,the entity relation in the document is taken as the graph structure,and a calculation model based on graph similarity is proposed.The features of the document graph are extracted through graph neural network,and the features representing the graph level are obtained for similarity calculation.The experiments verify the effectiveness of this similarity calculation method in document similarity calculation,graph similarity calculation and graph classification tasks.(3)Design and implementation of document similarity matching systemAccording to the needs of text duplication checking of science and technology projects,a document similarity matching system is designed and implemented.The demand analysis and function design of the system are carried out.The system can match and retrieve the text similarity.By labeling the text with different similarity degrees,users can intuitively feel the similarity of documents and have a better user experience. |