| Microporous inorganic crystal materials and zeolite have widespreadapplication in catalysis, adsorption, separation, and ion exchange. A commonsynthesis method is to hold the crystallization in presence of organic aminetemplate, in water heat or solvent heat environment. There are many factorsthat affect the synthesis of crystal nucleation and crystallization process,including the type and attributes of templates, type and ratio of reactants, typeof the solvent, temperature of crystallization time and so on. Because of thelack of understanding in the mechanism of porous crystal compoundscrystallization, there posed a great difficulty for synthesis. Theoreticalcalculation method on the research of template role in inorganic porousinorganic compounds crystallization has gradually been accepted. It mainlycontains calculating quantum chemical calculations and simulation. A numberof new application of chemical research, with aid of computer technology,has just started, such as pattern recognition, artificial intelligence, andcomputation software for microporous inorganic crystals. Based on thezeolite synthesis database, we discussed the application of statistical analysistechnology and Bayesian Network in the area of zeolite synthesis.Statistics is a subject of study the collection, generate, description,analysis and interpretation of data, gaining new knowledge or information,making new inference. Early 20th century, statistics was adapted to thecharacteristics of some other scientific subjects, bringing in new branchessuch as bio-statistics and economic statistics. With the aid of computertechnology and modern methods in mathematics, a new branch of chemistrystatistics is coming to the stage. In this paper, we described the statisticalprobability as the foundation of statistics, also with several theoreticalanalysis technology. With the help of SPSS, we implement statistical analysisfrom the perspective of qualitative and quantitative, retrieve severalregression model which can be used for classification. Traditional statisticalmethods require data to meet certain distribution. Therefore, the first step isto test the fitness of data distribution. If serious difference exists between thedata and normal distribution, non-parametric statistical methods should be adopted. Correlation analysis and scatter plot help to discover the linearrelation between variables, which could prepare for the multiple linearregression analysis. Multiple linear regression has strict restrain on the data,and the dependent variable should be quantitative variable. Logisticregression and multiple logit regression are less restricted. Dependentvariable can be categorical variables, giving classification probability model.In the analysis, data reduction is very useful. Two common methods of datareduction are principal components analysis(PCA) and factor analysis. Theystart from the covariance matrix of the raw data, after a series ofmathematical changes, reach to a principal components(factors) componentsmatrix. PCA(factor) reduce the number of variables bringing into analysis,with keeping most of the information in primitive variables. Besides, eachPC(factor) are vertical with each other, eliminate the multi-linear relationship.As proved in experiment, PCA(factor) can improve the quality of logisticregression model.Uncertainty of knowledge is inevitable in the research of zeolitesynthesis. Two basic performance of the uncertainty in knowledge is therandomness and ambiguity. Higher level uncertainty also containsincompleteness, uncoordinated and non-constancy. Data Mining andKnowledge Discovery introduce a lot of uncertain factors in each step. Fromknowledge management (the measurement of uncertainty) perspective,methods to handle uncertainty can be divided into probability and fuzzyapproach. Probability is one of the first to be tried as a most natural methodof uncertainty reasoning. It uses a set of random variables to characterize theproblem, represent the knowledge as a joint probability distribution,reasoning and calculating in accordance with the principles of probabilitytheory. Independent relations between variables can be used to decomposedthe joint distribution into a number of less complex distributions, improvingreasoning efficiency. Pearl(1986) constructing a DAG to show thosedependence and independence relations, that is, Bayesian Network. Bayesianprobability provides great convenience for reasoning. In qualitative level, ituses a DAG to describe the independent and dependent relationship betweenvariables. In quantitative level, it depicts a node's dependency to its parentnodes with a conditional probability table. In semantics, the BayesianNetwork represents a decomposition of joint probability distribution. Bayesian Network architecture can be given manually by experts, or beobtained through data analysis. The learning of Bayesian Network includesstructure learning and parameter learning. Structure learning algorithm can bedivided into two categories: scoring on the search algorithm, andconstraint-based algorithm. Scoring algorithm first gives the definition of ascoring function, then search to find the structure with highest score. Scoringcriteria include optimal parameter logarithmic likelihood function, CHscoring, Bayesian Information Criterion (BIC) scoring, MinimumDescription Length (MDL) scoring, AIC scoring, testify of data likelihooddegrees, and cross-certification. Constraint-based learning methods start fromthe data. First apply hypothesis test to gain some the independent relationshipbetween variables. Then find the network structure which has bestcompatibility with these constraints. Parameter learning has two categories,one is based on classical statistics and the other is based on Bayesianstatistics. Maximum likelihood estimates treat the unknown parameters as afixed amount, considering no prior knowledge. Bayesian estimation treat theparameter as a random variable, thus we can use priori knowledge.The algorithms of learning Bayesian Network on complete data isrelatively mature. How to learn the network from incomplete data effectivelyis a problem. Current study mainly combines the structure EM algorithm andconstraints. In this paper we discussed the missing data mechanism, typesand treatment, studied the missing value replacement methods of zeolite data,from the statistical point of view. Experiments show that mixed EMestimation can achieve best results. Based on the repaired data set, we usedprincipal components as the nodes of Bayesian Network, rather than theoriginal variables. Then discretize the continuous values by entropy-basedmethod. Finally learn the network with a scoring method. The model isaccord with general knowledge and expert experience. Data miningapplication in the synthesis of zeolite is still in the preliminary stage.Experts'prior knowledge is eagerly needed for further study. There alsoneed much more work to be done on improving the learning efficiency onincomplete data sets. |