Font Size: a A A

The Research On Method Optimization Of Data Cleaning In The Construction Of Agricultural Domain Knowledge Base

Posted on:2017-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:D D ShengFull Text:PDF
GTID:2308330485987247Subject:Information Science
Abstract/Summary:PDF Full Text Request
In the era of Big Data, the significance of the data lies in the professionally handing. Data quality has become the key to the success of data mining, expert decision-making, business intelligence and other activities. The rapid expansion of large amounts of data prevents the efficiency of data cleaning, so it is increasingly impossible to clean data manually and it is the tendency to improve the automation and efficiency of data cleaning methods. At the same time, scientific research gets into the fourth paradigm that is data-intensive. To better manage the fourth paradigm in the tide of the times, we need to scientifically manage the data and interoperability of data with ease. Building domain knowledge base is a way that is being explored actively by all walks of life. With policy guidance and support, all kinds of agricultural information service platforms and resources surge. Construction of agricultural domain knowledge base can help do a systematic collection and collation of information and knowledge and can make mass agricultural domain knowledge in order to ensure the effective organization, retrieval, use and sharing. To achieve the function and goal of agricultural domain knowledge base, we must clean data automatically to improve data quality and process efficiency.Currently, data cleaning methods of domain knowledge are mostly developed by rules made by experts and then automatically executed by the computer. This kind of method is certainly accurate, but requires the participation of experts in the field and needs to repeatedly revise and update the set of rules. If the amount of data is large and the rules are not significant, large human labor is needed.Meanwhile, the existing frameworks and processes are in accordance with the requirements of building a data warehouse. They are mostly based on the rules and will be more or less not proper in the face of the construction of agricultural domain knowledge base. Moreover, a variety of data cleaning methods are isolated from each other and there is no framework and process to guide the building of domain knowledge base. People do not know what to do when faced with a number of methods and tools. Many tools can not completely solve the data cleaning problem in building a knowledge base.Therefore, this paper explores these issues, comparing and analyzing the data cleaning tools and data matching algorithm which is the core of data cleaning to provide reference for optimizing the algorithm; designing a general framework and data cleansing process to guide data cleaning in the construction of agricultural domain knowledge base; and then doing empirical analysis with literature data in rice field and designing an optimization algorithm to solve the deduplication problem of author and affiliate. Finally we discuss the impact of the threshold values to provide references to the future work. Thereby reducing labor participation to improve the degree of automation and data cleaning efficiency.
Keywords/Search Tags:Agricultural Domain Knowledge Base, Data Cleaning, Data Matching, Framework, Process
PDF Full Text Request
Related items