| Motivation:Empirical likelihood (EL) is a nonparametric statistical inference method, and develops rapidly in the last few years. Due to little constrain on parameters of models, it is widely used in many statistical fields. However, it is not clear that how the efficiency of data parameteric distribution fitting by EL, and EL's compuation methods are not perfect very much, we investigate these questions. In addition, in our reserch on kernel smooth semiparametric models of repeated measurement data, we find that estimated matrix is not inverse and parametric estimator are not exclusive when the dimension of fixed design matrix exceeds 2. Fortunately, the property of the EL's adaptability to variational errors reminds us to introduce EL to semiparametric models.With the Bayesian computational methods become been improved, the Bayesian application is abroad gradually. The fact that EL inherits properties of parametric likelihoods suggests us to consider whether the EL can been as likelihood function of Bayesian analysis, and to explore the application of EL in Bayesian system.In bioinformatics research, there are many models to describe the genes expression regulatory network. And the structural equation model(SEM) of regulatory network provides a new platform to explain the parameter property and estimator. However, there are much strick constrains on error in traditional SEM, such as, lack of flexibility to use the known biologicial information, we will consider to integrate EL into SEM in order to solve above questions.Method:The whole research consists in three parts:â‘ We use genetic algorithm(GA)+EL method to different parameter and sample number of Weibull distribution, and compare EL estimator to maximium likelihood estimator(MLE) and percentile estimator; We adopt ridge+EL to kernel smooth semiparametric model, and analyze the results of least square estimator (LSE) and EL with additional constrain.â‘¡Firstly, we introduce the"perpor likelihood"concept of Bayesian analysis, and validate the confidence degree of EL to"perpor likelihood"on different conditions; secondly, put a random walk Metropolis algorithm to simulate EL posterior distribution, and explore its properties , in particular the relation to maximum EL estimator. Finally, we apply EL Bayesian analysis to linear regression models, and put the Gibbs+Metropolis algorithm to estimate the parameters of homoscedasticity and heteroscedasticity simulation data, respectively.â‘¢We construct SEM on the 10,080 genes microarray expression dataset of 7 different time points during human fetus CNS development. Firstly, select gene set which includes maximium information from brain cortex, and map these to GO dataset, and obtain candidate genes relative to development. Secondly, use the EL to estimate parameters, and get model structure by GA with EL relative AIC.Besides, we divide all gene into smooth and non-smooth according to expression profile, and describe the gene regulatory model by Lotka-Volterra equation and impulse function ,respectively.Result:â‘ The result of GA+EL is similar to MLE's to the large sample of Weibull distribution, and has been less effected by starting points than SQP. To the small sample, all method are not good. On semiparametric model of repeated measurement data, ridge+EL method can solve the non-inverse problem, and sum of square of residues is less than other methods, and estimated result of nonparameteric parts is more perfecte than other methods.â‘¡Whether the EL can be used as Bayesian analysis'likelihood part is relative to sample number and estimated parameters. And the confidence degree is more by the the sample number increasing when estimate parameter is the population mean. By this conclusion, the random walk Metropolis algorithm can simulate the EL posterior distribution , whose simulation samples are normal , and autocorrlate coefficients are not high. And mean of posterior sample is similar to maximum EL estimator. The linear regression model by EL Bayesian is similar to LSE to homoscedasticity simulation data, but the former is better to heteroscedasticity data. â‘¢According to procedure, we obtain 30 candidate genes from GO dataset. After 500 GA calculation, we get SEM with 9 genes, in which ACTG1 suppresses other 8 genes'expression as only exogenous variable. From Lotka-Volterra model from same candidate gene set, we know that there are suppressed effect between WASF1 , DCX and NOS2 , and stimulative effect between DCX, PRKCB1 and FHL1. Finally, we screen 4 genes: KIAA0332, PEG10, MYH11 and FRAP1, which are the most"suddenly expression"in 10,080 genes set by impulse function.Conclusion:From above research results, we can draw below conclusions:â‘ To large sample of parameter distribution , EL+GA's fitting effect is similar to MLE, and avoids the starting point problem. The ridge+EL can solve the estimated matrix non-inverse problem, is a effective method to kernel smooth semiparametric model.â‘¡Under some conditions, EL can be used as likelihood portion in Bayesian analysis, and we can solve the maximium EL estimator by random walk Metropolis algorithm.â‘¢SEM+EL model can include prior information, and relax the constrain on normal error. From three models, we select 5 genes(ACTG1, KIAA0332, PEG10, MYH11, FRAP1) and 4 gene-pairs (WASF1-NOS2, DCX-NOS2,DCX-FHL1,and PRKCB1-FHL1), which deserve further biological reserch. |