Font Size: a A A

Research On Methods Of Cross-language Patent Analysis

Posted on:2016-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Q LiuFull Text:PDF
GTID:1225330503453403Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Patent analysis has been widely used for management field. Patent analysis has been a hot research issue. Because monolingual patent analysis only analyse monolingual patents, it has poor quality using in case of cross-language patents. It is difficult to process multi-language patents for the four characteristics of multi-language patent technology, such as new terms, cross-language, interdisciplinary and globalization. In this thesis, it’s researched that the methods of cross-language patent analysis for the four characteristics of patent technology based on patent analysis, natural language processing, and visualisation, which has supplied new ideas and methods for solving the problem of multi-language patent analysis.The main innovation achievements in this thesis are summarized as follows:(1)It’s proposed that automatic term extraction from cross-language patent documents based on term value(TValue). In the proposed method, the three stages of this method are illustrated. The stage 1 is to extract the strings based on the rules of the first and the last part of speech. In the stage 2, it is to derive the TValue of string using the corresponding degree of the first and the last part of speech, degree of number of words, degree of independence, degree of stopping usage, and degree of importance in the extracted strings. The stage 3 in the method is to select the candidated terms by analysis of the strings whose TValue is more than the pre-defined confidence threshold. The simulation results of term extraction show that automatic term extraction based on TValue can been used to extract many terms with low frequencies and some terms except for noun terms, whose quality is better than the baseline methods.(2)It’s proposed that automatic sentence alignment of cross-language patent documents based on hybrid strategies integration(HSI). Some methods are applied for the sake of increasing the quality: ①to design the degree of sentence-length correspondence, the degree of semantic correspondence, the degree of symbol similarity, the degree of label similarity; ②to derive SValue that computed by the four attributes of sentence pairs in source language and target language based on the combination of probability; ③to extract sentence alignment 1-to-1, 1-to-0 and 0-to-1 by selecting the candidated sentence-aligned pairs by analysis of the sentence pairs whose SValue is more than the other in the same document for every source sentence; ④to discover sentence alignment including many-to-1, 1-to-many, many-to-many, many-to-0 and 0-to-many correspondences with higher SValue by merging 1-to-1, 1-to-0 and 0-to-1 correspondences and deleting accomplished sentences from the candidate sentence group. The simulation results of sentence alignment show that automatic sentence alignment based on HSI can be used to extract all kinds of correspondences in the cross-phylum sentences that are noisy, whose quality is higher than the baseline methods.(3)It’s proposed that automatic term alignment of cross-language patent documents based on advanced multi-strategies and Giza++ integration(AGiza) or advanced multi-strategies and LLR integration(ALLR) integration. In order to increase the quality, many methods are applied: ①to recognize the candidated term pairs whose degrees of the first part and last part are more than zero; ②to construct the degree of semantic correspondence, the alignment correspondence degree of the first part and last part, the similarity degree of the part of speech, the degree of the independence correspondence, the degree of the stopping correspondence, the value of g based on GIZA++, the degree of co-occurrence correspondence, the degree of length correspondence; ③to compute the degree of AGiza/ALLR term alignment by the attributes of term pairs; ④to candidate term pairs whose degree of AGiza/ALLR term alignment is more than the term-aligned threshold that make the tolerance of recall is less than 1%. The simulation results of term alignment show that automatic term alignment based on AGiza/ALLR outperforms GIZA++, the Dice coefficient, the Φ2 coefficient, the log-likelihood ratio, K-VEC and DKVEC at all kinds of recall values.(4)It’s proposed that similarity analysis of cross-language patent documents based on the concept of cross-language term(CCT). In the proposed method, some methods are used: ①to construct cross-language synonymous term bank by cross-language synonymous term recognition; ②to discover the co-occurrence relation between term concepts and objects using concepts to replace synonymous terms; ③to computed CCT similarity by the co-occurrence relation between concepts and objects; ④to construct cross-language network based on CCT similarity; ⑤to draw cross-language global maps which overlay the cross-language network on the global geographic map; ⑥to build term overlay maps based on betweenness centrality and cross-language network. The simulation results of similarity analysis reveal that it can analyse multi-language patents effectively using the methods of similarity analysis of cross-language patent documents based on CCT.The research of the four methods proposed in this thesis has been verified and validated in the Chinese-English battery patent data. The methods are valuable for providing a reference for the application of science and technology management, strategic patent planning, text mining, machine translation, big data management, and so on.
Keywords/Search Tags:Cross-language, Patent Analysis, Term Extraction, Sentence Alignment, Term Alignment, Similarity Analysis
PDF Full Text Request
Related items