| In recent years,all walks of life have accelerated,broadened and deepened their development under the support of huge data volume in the era of big data.Each year,China ranks the top in the world in the number of new patent applications,accounting for 45.29% of the total number of global applications,and the number of effective patents has surpassed the United States to become the first in the world in 2021,accounting for 21.82% of the global total.In order to meet the needs of the development of the times,strengthen the connection between domestic and foreign patent data,reduce the difficulty of researchers in using patent data,and improve the efficiency of patent data utilization,more and more scholars are studying how to effectively and high-quality establish the mapping of domestic and foreign patent data.The International Patent Classification(IPC)is an international standard for the classification and retrieval of patent documents.And Chinese library classification(CLC)is a large-scale comprehensive classification of books in our country.Automatically and accurately establishing the mapping between IPC categories and CLC categories is of great significance to realize cross-database retrieval and cross-browsing among patent documents and books.In view of the fact that in the current research,only the Chinese manual translations of the description text of IPC categories are used to establish the mapping between the IPC categories and the CLC categories,and completely ignoring the original English category description text of the IPC categories,the paper proposed a neural network-based automatic method with source side information(English side information)for IPC categories and CLC categories mapping.Firstly,the word representations of the IPC categories and the CLC categories are generated through the pre-trained language models BERT and XLM-R respectively;then the multi-head attention mechanism is used to fuse the BERT word representation and the XLM-R word representation of the IPC categories and that of the CLC categories,and finally two feed-forward network layers are used to establish the mapping between the IPC categories and the CLC categories.The experimental results on public datasets show that the proposed method is significantly better,as well as more stable and more generalized,than the current state-of-the-art method.On the supplemented public data set,the accuracy rate of the method proposed in this paper is 96.5%,which significantly improves the average accuracy rate of automatic mapping of IPC and CLC categories(2.5% higher),significantly reduces the variance of cross validation,and improves the robustness of the system.The experimental results show that the proposed method is significantly superior to the current optimal method,and its performance is more stable and more generalized.The experiment analyzes the validity of the source information and the rationality of using cross entropy loss and classification framework,which meets the requirements of solving one-to-many problems in the real world. |