Font Size: a A A

The Comparison Of Background Datasets With Different GC Content To Predict Cis-regulatory Module

Posted on:2008-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:S J DuFull Text:PDF
GTID:2120360218455917Subject:Genetics
Abstract/Summary:PDF Full Text Request
BACKGROUNDDeciphering the mechanism that regulates the expression of genes is still a great challenge for molecular biology community. Computational methods provide great hope for the discovery of novel regulatory elements so far. However, high false -positive rate is the main problem of these methods and parameter optimization is one of the methods to lower it. Background data set is required by all these computational methods to serve as the negative set in the training set, and we hypothesized that the selection of background set will have great impact on the sensitivity and specificity of the computational methods.OBJECTIVES(1) To test whether the selection of background data set can improve the prediction of transcription regulatory dements.(2) To find the optimal background data set among the data sets used in present research.(3) To explain why the selection of background data set can improve the prediction.SUBJECTS AND METHODSThe Position Weight Matrix was used to describe the binding pattern of transcription factors with DNA sequences. The Logistic Regression Model was used to model the combinatorial binding of transcription factors with DNA sequences. The Receiver Operator Characteristics curve was employed to evaluate the prediction ability of each model and the Area under ROC curve was served to compare the prediction ability.RESULTSIn liver-specific data sets:(1) The area under the ROC curve of the prediction model with the background data set HGP is significantly lower than the area under the ROC curve of the prediction model with promoter background data set.(2) The binding patterns of most liver specific transcription factors that were chosen are AT-rich.(3) The PWM scores of HGP dataset are statistically higher than the PWM scores of the two promoter background data sets.In skeletal muscle-specific data sets:(4) The area under the ROC curve of the prediction model with the background data set HGP is not significantly different with the area under the ROC curve of the prediction model with promoter background data set.(5) The amount of liver specific transcription factors of which the binding patterns are AT-rich is the same as the amount of liver specific transcription factors of which the binding patterns are GC-rich.(6) The PWM scores of HGP dataset are not statistically different with the PWM scores of promoter background data set.CONCLUSIONS(1) The selection of background data set does influence the prediction of transcription element.(2) Among the most popular background data set that used, the most suitable dataset is determined by the GC content ratio of the binding pattern of transcription factors.(3) If most transcription factors prefer to bind to GC-rich sequence, then the randomly-chosen genome sequence should be as the background data set to improve the transcription regulatory elements prediction tools. If most transcription factors prefer binds to AT-rich sequence, then the promoter sequence should be as the background data set to improve the transcription regulatory elements prediction tools. ObjectiveTo investigate the difference between the anthropometric indices (AI: Waist Circumference, BMI, Waist-to-Hip Ratio) to predict cardiovascular risk factors in employees in Qingdao Port.MethodsAs a part of Oingdao Port Health Study, a cross-sectional sample of 11359 employees of Qingdao port (male: 8758, female 2601) aged from 18 to 54 years was studied, and blood pressure, height, weight, waist circumference, hip circumference, blood glucose, cholesterol, triglyceride, total cholesterol and high density lipoprotein cholesterol were surveyed. Logistic regression analysis was employed to explore the relationship between the anthropometric indices and the cardiovascular risk. The receiver operating characteristic analysis was used to compare the sensitivity and specificity of various AI to predict CVD risk factors clustering and to determine the optimal WC cut-off values by comparison of AUC(areas under the curve) corresponding to each AI.Results①The main risk factors (height, weight, waist circumference, hip circumference and waist-to-hip ratio) in males were remarkably higher than in females (P<0.05).②The prevalence of obesity, hypertension, diabetes, metabolic syndrome and other cardiovascular risk factors in males were also significantly higher than in females(P<0.05). ③Odds ratio of acquiring various CVD risk factors increased significantly with increment of WC, WHR and BMI in men, however, compared with BMI and WHR, odds ratio of acquiring various CVD risk factors increased more quickly with WC, and the same is true in women.④The areas under the receiver operating characteristic curve of WC to predict hypertension, diabetes, dyslipidemia and CVD risk factors clustering are significantly higher than that of BMI and WHR (P<0.05), The optimal cut-off values of WC in predicting type 2 diabetes, hypertension, dyslipidemia, and MS using the ROC analysis were 85, 85, 83 and 85cm in men and 76, ?6, ?2 and 79cm in women, respectively.ConclusionsWC predicts CVD risk factors better than BMI and WHR, and the best WC cut-point is 85 cm in men and 79 cm in women for identifying high risk of cardiovascular disease.
Keywords/Search Tags:Cis-regulatory module, Background data set, GC content, Position Weight Matrix, Receiver Operator Characteristic curve, Body mass index, Waist circumference, Cardiovascular disease, Risk factor
PDF Full Text Request
Related items