Font Size: a A A

Intelligent Modelling Based On Analysis And Process Of Redundant Problems And Its Application

Posted on:2016-07-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:1221330467976656Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology, the petrochemical industry continues to develop to improve the quality and to diversify the variety of its products. Facing with new complex processes and systems, due to the lack and unreliability of prior knowledge, modelling based on mechanism can not meet the requirements of industry. Therefore, data driven based modelling is attracting more and more attentions. Although data driven based modelling does not need to understand the mechanism of processes, it highly depends on the quality of data and the structure of model. But there are big issues among these two parts, i.e. redundant information problems. A) the selected input variable may be irrelevant to the dependent variable, and may have redundancy among them. A sufficient number of variables are determined through the basic prior knowledge in industrial process, and usually there are strong relationships among them. If they are taken as model inputs, it will directly increase the complexity of the input structure, and the model performance may be seriously influenced by redundant information. B) the redundancy in the model structure. The performance is closely related to its structure model, and the computational efficiency highly relays on the complexity of structure. Thus, according to these two redundancy problem in intelligent modelling, firstly combining multivariate statistical methods like principal component analysis (PCA) and mutual information (MI) analysis to analyze how to eliminate the redundant information and how to detect the irrelevant input variables. Then the proposed partial mutual information (PMI) selection and novel clustering technology based on PMI, effectively optimize the structure of neural network through eliminating the redundant information among the outputs of hidden layer. The main contributions of this dissertation are as follows:(1) For the complex redundant information among input variables, a improved radial basis function network (RBFNN) combining PCA and MI is proposed (PCA-MI-RBFNN). Firstly, PCA is employed to character the PCs based on variance from original variables, among which there is non-correlation. Models are established based on the relationship between inputs and outputs, therefore directly taking principal components (PCs) as model inputs may ignore the information in output. Thus MI can accurately estimate correlation between PCs and output, and then select the most related PCs as model input. Through the application of the classical testing modelling data and by-product4-carboxybenzaldehyde (4-CBA) concentration soft sensor model in industrial p-xylene (PX) oxidation reaction, PCA-MI-RBFNN model shows its good predicting and robust performance through eliminating the redundancy.(2) For the irrelevant input variables and redundant information among input variables, a method called MI-PCA-MI based relevance vector machine (RVM)(MI-PCA-MI-RVM) is proposed. If some of the input variables which are irrelevant to the model output are used in the model, the model will be inappropriate. Meanwhile putting the input variables which are relevant to the model output and have redundant information as the model inputs, the model performance will be highly affected. So it is critical to have a first selection on the original data. Through MI, the probability density distribution of the mutual information between all input and output variables could help to determine a threshold, which can distinguish irrelevant and relevant input variables. After eliminating the irrelevant inputs, using PCA-MI to select the optimal PCs as the RVM model inputs. Through the application of by-product4-carboxybenzaldehyde concentration soft sensor model in industrial p-xylene (PX) oxidation reaction, MI-PCA-MI-RVM model is validated to have good predicting and robust performance through eliminating the irrelevant input variable and redundancy.(3) For the optimization of structure in RBFNN, a partial mutual information-least square regression (LSR) based method is presented to optimize the RBFNN structure (PMI-LSR-RBFNN). Partial mutual information could effectively select the optimal hidden layer nodes, which have minimal redundancy among other hidden nodes and maximal relevance with output variable. Then through LSR, the weights and bias between hidden layer and output layer are updated. The burning side reaction model of INVISTA oxidation process developed by PMI-LSR-RBFNN is better than the RBFNN variants improved by K-means, Fuzzy C-Means, K-mediods and Subtractive Clustering. Through Sammon nonlinear mapping, it tells the hidden layer nodes selected by PMI perform better than those uniformly distributed in the spatial distance. Based on this model, sensitivity analysis shows that the effect rules are consistent with the prior knowledge rules and provide operational guidance for the INVISTA oxidation process.(4) For the optimization of structure in multilayer feedforward neural network (MLFNN), a novel clustering technology (minimal Redundancy Maximal Relevance-Partial Mutual Information Clustering, mPMIc) combining LSR is developed (mPMIc-LSR-MLFN). With the rapidly raising of dimension in inputs, the PMI spends too much time on calculating and the estimation accuracy is highly affected. Therefore a novel clustering technology called mRMR-PMI clustering (mPMIc) is proposed. Firstly, select the appropriate hidden layer nodes as the initial clustering centers. Then cluster the all hidden layer nodes into groups, and update the clustering centers through PMI. Eventually the optimal hidden layer nodes could be obtained until all the clustering centers stop to update. The weights and bias between the selected hidden and output layer are then updated through LSR. Through the application of the naphtha dry point soft sensor modelling, the mPMIc-LSR MLFN with a simple network size performs better than other improved MLFN variants based on K-mean, Subtractive Clustering and three existing improved extreme learning machines (OP-, OS-, B-ELM).
Keywords/Search Tags:redundancy, mutual information, principal component analysis, neural networks, least square regression
PDF Full Text Request
Related items