| Patent big data has become the domestic and foreign scientific research,business intelligence analysis and accurate innovation and entrepreneurship important basic resources.Achieving high-quality data cleaning objectives for patent data to improve the efficiency of the use of the resource is of great importance.However,due to the characteristics of Chinese characters and their input methods,there is a unique difficulty in the cleaning of entries in China’s patent data,in which the inventor’s name disambiguation is the most urgent problem to be solved.When using the Chinese character input method to enter the patent information in the patent database,it may produce inventor name ambiguity(homophone and near word ambiguity),which reduce the quality of patent data.These ambiguity problems will affect the recognition,mining and training of outstanding inventors,and will also affect the study of the role of the inventors and their cooperative relations.To a certain extent,will weaken the rational allocation of enterprise resources.Application of patent data information about inventors must check and correct the inventor’s name ambiguity in the first.Because the ambiguity of the name of the inventor in the patent data is an obstacle to the realization of the high quality data cleaning target,it will have certain influence on the enterprise’s accuracy decision,low cost input and R&D direction selection.For the inventor of the patent information on the name ambiguity,foreign research has been relatively mature,produce a variety of inventors disambiguation algorithm.but for the algorithm of Chinese inventors name disambiguation,the study rarely is involved.As the Chinese and English language in the logical structure,word distribution characteristics and the use of habits and other aspects are quite different,foreign patent data cleaning algorithm has been unable to apply to the Chinese patent data in the inventor information cleaning needs.Based on the review of Chinese and English inventors’disambiguation algorithm,starting from the particularity of Chinese characters,this paper designs an effective algorithm to solve the problem of the inventor’ s name ambiguity in Chinese patent data.The main contributions are inventors’ name disambiguation algorithm based on the similarity of patent entries and inventors’ names disambiguation about Hall for workshop of metasynthetic engineering.In this paper,the patent information of top 100 domestic pharmaceutical enterprises in 2015 is taken as an example to verify the validity and feasibility of the disambiguation algorithm,and the algorithm in improving the efficiency of patent information cleaning is revealed.The disambiguation algorithm provides a new idea for the technical exploration of data cleansing in the patent database,and it is helpful to use the patent data for network innovation and intelligence analysis and other research work. |