Font Size: a A A

Research And Application Of Technology Information Similarity Detection Technology For Oil And Gas Pipelines Field

Posted on:2022-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:2531307109469314Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scientific and technological information management is an important work in the management of enterprises and institutions.Efficient and accurate information collection,screening,processing and utilization can provide important support for development decisionmaking and planning.In view of the prominent problem of repeated scientific and technological project approval in the field of oil and gas pipeline science and technology information management,this topic studies and analyzes the characteristic indexes and elements of pipeline science and technology project information similarity.Through information technology means,the similarity detection combining quantitative and qualitative is realized,so as to provide guarantee for high quality scientific and technological project approval.Based on the in-depth study and analysis of the text characteristics of pipeline science and technology project text,this paper focuses on the application of natural language processing technology in the text similarity of pipeline science and technology project,and proposes a solution for the similarity calculation of pipeline science and technology information based on the objective of accurate text representation.Firstly,the correct identification of professional phrases and the domain-specific anaphora resolution task are solved.Aiming at the problem of phrase recognition,a recognition method based on the integration of the ratio of mutual information between words(KI)and the custom rules of part of speech collocation is proposed to avoid semantic segmentation in the text of science and technology projects.To solve the problem of reference resolution,this paper proposes to use Stanford Core NLP to analyze the phrase structure,take the shortest distance as the basis for the resolution of nominal phrases,and substitute the entity of important demonstrative pronouns such as "it".Then,the dependency syntactic structure was analyzed to obtain the semantic role annotation information for triplet extraction,and the weighted triplet text representation method was proposed by combining the Text Rank algorithm for weight calculation.The experimental results show that the weight triplet representation method can carry the key structure semantic information and weight information of text effectively.Finally,the results of text representation are fused based on the semantic similarity of word granularity.It mainly includes:(1)constructing domain synonym lexical forest.By making use of the speciality of the domain,a domain synonym lexical forest is created to supplement and extend the existing lexical forest.(2)The semantic similarity of word granularity is calculated by integrating domain lexical forest and general knowledge base.(3)Explore the method of word similarity fusion based on KM algorithm.Finally,the text similarity is calculated by triple maximum matching.The experimental analysis of lexical semantic similarity calculation shows that the inclusion of domain word forest can effectively improve the accuracy and reliability of the similarity calculation of specialized phrases.The similarity calculation experiment of general domain text and pipeline science and technology project text shows good accuracy and practicability in pipeline science and technology project text experiment by using multi-group contrast experiment method,and develops a prototype system of scientific and technology information rechecking.This study makes a useful exploration on the duplication checking problem in the pipeline science and technology information management.
Keywords/Search Tags:Mutual information, dependency analysis, text representation, semantic similarity
PDF Full Text Request
Related items