Font Size: a A A

Research On The Data Mining In Chemistry And Chemical Engineering

Posted on:2006-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H ShuFull Text:PDF
GTID:1101360152971731Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
The data increase steady in the field of Chemistry and Chemical Engineering Data Mining is a powful tool to evaluate "hidden" information from large amount of data, but the methods of data mining shall be suitable to the characteristic of data in variable field. For the data with the feature of higher-dimension,noise and compound linear in Chemistry and Chemical Engineer, by the methods of neural networks,rough sets ,fuzzy sets and statistic, our work focus on the problem of feature selection, discretization, rule generation, chemical pattern modeling and chemical process modeling, the main contributions in this disseration are as follows:(1) A methods of feature selection based on regularization networks -genetic algorithm is present. We adopt the Bayes regularization method to get a well generalized neural networks, present a heuristic genetic algorithm to prune the regularization networks by sensitivity analysis, and the minimum and optimal attributes set which represent the characteristic of classification can be selected from the patterns of high dimensionality. Finally, the problem of attribute selection and patterns classification of spearmint essence is applied to check the validity of this method, the result show that the method is superior to the other methods obviously.(2) Discretization based on chi-square statistic method always need to set suitable significance level or inconsistent rate manually. Data analysis of rough sets doesn't usd any prior knowledge about data, the information entropy of rough sets can measure the uncertainty of knowledge well, it also reveal the characteristic of classification in data, so the information entropy is treated as the evaluation function for discretization, it is determined by the inherent characteristic of data, not any external knowledge about data. Moreover, the sequence of discretization for each attribute in multi-attributes should effect the result of discretization, we order it by the value of feature merit measures. At last , we present a algorithm based on information entropy as RSE-Chi2 with no parameters set manually. The application of the algorithm show it can overcome the disadvantage of Chi2 algorithm, and RSE-Chi2 can be used to generatethe reduction of attribute.(3) In order to get well generalized rules, and let the classifier based on rules has good predicative. Firstly, the redundance of cut point of discretization is eliminated when attribute reduction are integated into discretization based on RSE-Chi2, and the attribute reduction generalize well. Secondly, a greedy algorithm which selecting the value of attribute with the best quality of classification generates a satisfying value reduction. Finally, the predicting is based on the rule's statistic parameter and matching degree. At last, we use the methods to chemical pattern classification rules generate and classifier modeling, compare to the statistical methods and neurol networks, the meaning of model is very understandable in chemical domain, and the prediction of the model is also well.(4) When continuous attribute is discretized into intervals, the interval can be regarded as fuzzy region, and every value of attribute after discretization is a linguistic value in fuzzy theory, so rough sets mthodscan be intergated with fuzzy set methods, a fuzzy inference system can bebuilt from the rules generated by rough sets, whose paramters are trainedby BP algorithm, we call the system as fuzzy-neuro networks system. For classification, we present a fuzzy-neuro networks whose structure is decided by the fuzzy rule generate by rough sets methods , and we initialize the paramter5of networks by the rule's statistic paramler and the result of discretization; When the rough sets used for regression, by discretization of decision attribute, the regression is turned into classification, rough sets generate the sugeno fuzzy rules by postprocessing the pseudo-classes rules, and...
Keywords/Search Tags:Data mining, Rough sets, Feature selection, Discretization, Reduction, Chemical Pattern Classifier Modeling, Chemical Process Modeling
PDF Full Text Request
Related items