Font Size: a A A

Research On Similarity Calculation And Clustering For Science And Technology Project

Posted on:2016-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhaoFull Text:PDF
GTID:2348330488998817Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the gradual increasing in the funding of science and technology in our country, the number of technology project applications from R & D institution is becoming more and more, how to effectively solve the problem whether the two projects are similars is a very important part in project management. Traditional manual checking is obviously not enough, while some existing duplicate checking system can't meet the requirements for accuracy and speed, therefore the study on the key technology of duplicate checking system in technology project is necessary and meaningful. This paper focuses on the presentation model of technology project, similarity calculation, clustering technique. The major work includes the following aspects:1. According to the characteristics of the complex information in technology project, this paper proposes knowledge representation model and relational model for technology projects by combining matter-element knowledge representation model and vector space model, to facilitate follow-up to represent and process the technology project.2. Aiming at the duplicate checking demand for technology project, analyzes and summarizes similarity calculation method based on vector space model and semantic understanding respectively, on the basis of this, presents a VSM similarity calculation method based on semantic understanding. Technology project name contains a lot of useful information, few words and more specialized terms, according to this characteristic, an improved sentence similarity computation method based on editing distance is proposed. Finally applies the above two methods to the main contents of technology project and project name similarity computation separately, and adjusts the weights, calculates the similarities of the overall technology project as a whole.3. When checking the technology project, requires the unknown project and the existing all projects for comparison, less efficient, confronts with this weakness, this article adopts project clustering first, and then check. While the existing clustering algorithm advance the input parameters, and time complexity is quite high, and thus couldn't be applied to large projects, this paper presents a nearest neighbor project clustering algorithm based on bi-threshold, and applies it to the project duplicate4. checking system, in the case doesn't affect the accuracy of checking, improves the speed of duplicate checking.Based on the above similarity and clustering algorithms research results, the actual similarity detection system has been applied to technology project management system of Zhejiang province, effectively realizes the project duplicate checking function, and has a good checking accuracy and running speed, successfully validates the feasibility of this study.
Keywords/Search Tags:VSM, semantic understanding, similarity calculation, clustering
PDF Full Text Request
Related items