| Patent similarity detection refers to the evaluation of the association between two or more patent texts by calculating their similarity.Patent similarity detection can help people better understand the relationships between patents and promote work in patent applications and patent infringement detection.In the field of patents,the most common type of single-modal data is text,so many patent similarity detection methods are based on patent text.The advantages of single-modal patent similarity detection methods are simplicity and ease of implementation,but there are also limitations,such as the inability to handle multimodal data,the inability to consider semantic information in patent text,and difficulty in handling the complexity of patent text,so they cannot comprehensively evaluate the similarity between patents.The challenge now is how to modify traditional similarity detection algorithms to adapt to the complex characteristics of patent data.Multimodal technology refers to a method that uses multiple modal data to improve the accuracy and generalization of models.Based on the data support of the National Intellectual Property Administration’s "Research and Platform Key Function Design for Patent Technology Application Transformation" project and the "Construction of Patent Information Public Service System" project declared to the Chongqing Science and Technology Bureau,this thesis aims to provide users with more dimensions of patent similarity detection and carries out a series of research works as follows:(1)Building a patent multimodal information structure.First,the patent data is preprocessed,classified according to patent type,and high-similarity patents are obtained through methods such as web crawlers,and entity disambiguation is designed to solve the problem of identical patents.Second,based on the structure and multimodal characteristics of patent information,feature extraction is performed on patent abstracts,citation relationships,semantic specifications,additional images,and other data to determine the relationship between each modal feature and extraction method and the feature storage method,completing the construction of the patent multimodal information structure.Finally,according to the constructed patent multimodal information structure,the frontend framework Vue.js and the backend framework Spring Data are used to implement the visualization of multimodal information.Users can obtain recommendations for other similar patents through the display of details of multimodal information.(2)A patent similarity detection method that integrates multi-modal information is proposed as one of the core functions of the system.Based on the extracted multi-modal patent information,a method that integrates multi-modal information for patent similarity detection is proposed.The method employs the idea of multi-task learning,treating similarity detection and weight allocation as two relatively independent modules.For the model input,the method uses the vector space model to extract patent text vectors,employs the Sim Net model to obtain semantic information of the subject,constructs image feature vectors using the SURF algorithm,and finally combines the various modal features and their similarity detection results as part of the multi-modal similarity,along with the citation relationship,for the final multi-modal similarity fusion.(3)We implemented and optimized the multi-modal similarity fusion strategy.After obtaining the multi-modal similarity detection results of patents,this system uses a linear weighted feature fusion method to fuse the similarity detection results,and designs experiments to compare the efficiency and accuracy of different fusion strategies.Finally,based on the actual structural features of patents and system load requirements,we select an appropriate multi-modal similarity fusion strategy and optimize it.In terms of system implementation,we provide a user interface for similarity detection results of patents based on the backend algorithm model,which can be accessed through user inputs.(4)A patent similarity detection system with multiple functionalities was designed and implemented.The system adopted the traditional B/S architecture,separating the front-end and back-end.A relational database My SQL was used to store detailed data of patents including patent texts,inventors,IPC classification numbers,as well as multi-modal patent information.Based on the above data,basic functions such as user login and follow,as well as major functions including patent information retrieval,enterprise search,and patent similarity detection were implemented.Finally,functional and performance testing were carried out for relevant modules of the system.In summary,this thesis combines various related technologies such as multi-modal and similarity detection to design and construct a multi-modal information structure for patents.Furthermore,a patent similarity detection method that integrates multi-modal features is proposed,and a suitable multi-modal similarity fusion strategy is chosen and optimized for the detection method.Finally,a patent multi-modal similarity detection system is implemented,which provides system functionalities such as patent information retrieval,visualization,patent similarity detection,and similar patent retrieval.The system has been successfully deployed through Nginx server and can provide detection services for relevant users. |