Font Size: a A A

Study On Knowledge Discovery Based On National Crop Germplasm Resources Database

Posted on:2008-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W TangFull Text:PDF
GTID:1103360215978175Subject:Crop Science
Abstract/Summary:PDF Full Text Request
National Crop Germplasm Resources Database(NCGRDB) holds over 40 gigabytes data on 390,000 accessions of germplasm of 180 kinds of crops, including food crops, fibre plants, oil crops, vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc. Uncovering hidden knowledge from NCGRDB with KDD principles, methods and techniques is more and more important in crop informatics. This is helpful for exerting more effectiveness of NCGRDB and better preserving and utilizing crop germplasm resources.After analyzing characteristics of different methods to handling missing value, according to the traits of data in NCGRDB, the dissertation proposed an approach based on normal distribution stimulation to handle missing value of continuously digital data and an approach based on random number to handle missing value of discrete data. With these approaches and discretization based on semantic distance, the dissertation evaluated missing values and discretized continuously digital data of NCGRDB.The dissertation analyzed KDD techniques such as statistics, decision tree, association rule, nerve network, genetic algorithm, fuzzy set and rough set. Then it proposed a method based on association rule to mining knowledge from NCGRDB. After comprehensively analyzing association rules mining algorithm, especially Apriori algorithm and other algorithm based on Apriori, it improved Apriori algorithm and proposed a new algorithm which can mine multi-dimensional association rules from NCGRDB.The dissertation also investigated some representative KDD systems and designed NCGRDB KDD system, then developed its proto-system. With this system, the dissertation mined out some association rules of soybean about its agronomic characteristics, protein content, fat content, fatty acid content, resistance to aridity, resistance to pest and disease.
Keywords/Search Tags:Crop germplasm resource, Association rule, KDD, Data mining
PDF Full Text Request
Related items