Font Size: a A A

Research On Patent Technology Topic Recognition Based On Sentence-BERT

Posted on:2024-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:M W ZhouFull Text:PDF
GTID:2568307067991339Subject:Business analysis
Abstract/Summary:PDF Full Text Request
With the development of social science and technology and the advent of the information age,technological innovation has become an important factor to measure the scientific and technological strength of various countries.As the largest open source of technological innovation,patent documents contain great economic and technological value.The topic identification and analysis of existing large-scale patent data can detect the subject distribution and hot technologies in specific patent fields,so as to achieve the purpose of locating technology frontier hotspots and predicting the future direction of technology.In this context,how to conduct efficient and high-quality subject mining and identification of specific fields of patents,and obtain high-value and high-level technical information from them is an important problem that needs to be explored and solved in the field of information science.It is an urgent requirement for scientific research workers and government decision-making departments to formulate scientific policies and provide strategic decisions.Based on this,this paper introduces the idea of deep learning and combines the characteristics of patent documents to propose a patent technology topic recognition method based on Sentence-BERT.This method applies the Sentence-BERT model in deep learning to the vectorized representation of patent documents,and allows the text representation results to be input into downstream tasks,and then combines the steps of data dimensionality reduction,cluster analysis,topic word extraction and interpretation,to dig out patented technology topics in specific fields.In order to verify the effectiveness of the method proposed in this paper,a series of experiments are carried out in this paper.This paper takes the patent data in the field of artificial intelligence in the Yangtze River Delta region from 2015 to 2022 as the experimental object,and compares and analyzes eight patent technology topic recognition methods.Studies have shown that the use of the Sentence-BERT model can take more into account the semantic connection between contexts in the process of text vectorization,so that document data can be better represented in the vector space.In this way,finer-grained,more refined,higher-quality,and deeper topics can be excavated,which is helpful for analyzing and exploring subdivided features in topics,and significantly improving the diversity and interpretability of topics.The main research content and research innovations of this paper are as follows:(1)At the theoretical level,a patent technology topic recognition method that matches the characteristics of patent documents is proposed.Due to the lack of keywords in patent documents,and the practice of using unique or uncommon words or phrases for technical descriptions in order to maintain novelty and avoid patent minefields,the difficulty of patent technology topic clustering and identification using text mining techniques is increasing.Therefore,this paper proposes a patent technology topic recognition method based on Sentence-BERT.The Sentence-BERT model can represent text data as a vector with rich semantic features,so as to realize the sentence embedding representation of patent text,and then combine with a series of processes such as model fine-tuning,data dimensionality reduction,text clustering,and topic word extraction to effectively solve the problem of sparse semantic features of patent document vectors and insufficient semantic description.(2)At the application level,apply the method proposed in this paper to the field of artificial intelligence.In this paper,combined with the artificial intelligence patent data,the Sentence-BERT training model is fine-tuned in order to expect the model to show high adaptability in specific fields,so as to obtain high-quality text vector representation.Using Sentence-BERT as the text feature extractor,the fine-tuning method is applied to train the model in specific corpus information to complete the vectorized generation of patent text abstracts in the field of artificial intelligence.By mining the unstructured information in the patent text,it comprehensively and systematically shows the hot topics and technical hotspots in the field of artificial intelligence in the Yangtze River Delta region,thereby enriching the research results of technology prediction in the field of artificial intelligence.
Keywords/Search Tags:Technical topic recognition, Sentence-BERT, patent text, topic clustering
PDF Full Text Request
Related items