Font Size: a A A

Study On The Influencing Factors Of Gastroesophageal Reflux Disease Using Multilevel Models And Association Rules

Posted on:2010-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q MaFull Text:PDF
GTID:1114360275475779Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background: Gastroesophageal reflux disease(GERD)is a relatively common disorder in the Western populations, which main symptoms include heartburn and acid regurgitation. Limited studies indicate a lower prevalence of GERD in Asian populations. The prevalence of GERD is reported to be increasing in both Western and Asian populations. Patients with GERD are at increased risk of esophageal complications and esophagus adenocarcinoma. In addition, GERD has a significant impact on the quality of life, and is associated with substantial economic costs. And so many Western researchers did many epidemiology surveys to study the prevalence of GERD and its influencing factors. However, awareness of GERD in China is low and the relationship between GERD and extra-esophageal symptoms in Chinese populations is poorly recognized; there are few high-quality population-based surveys on the prevalence of GERD in China, and studies using standardized, well-validated international questionnaires are limited. In order to survey the prevalence of GERD in general population of China and study its influencing factors,we performed a large-scale GERD epidemiology survey in mainland China. A randomized, stratified, multiple-stage sampling method was used in this survey. Questionnaires were self-administered. Finally, 16,078 valid questionnaires were finished in Shanghai, Beijing, Wuhan, Xi'an and Guangzhou. The information gathered in this survey was very abundance, and the data had obvious hierarchically structured characteristics and also had some missing values. However, the traditional statistical analysis methods had obvious limits, such as requiring each observation value independent one another, no missing values, and so on.Aim: To avoid the limits of these traditional statistical analysis methods, we associated the association rules and multilevel models to analyze the data of GERD epidemiology survey in mainland China. The aim of this study was to evaluate the influencing factors of GERD more scientifically, enhance people's cognition degree to GERD, and provide available information for preventing and controlling GERD in China.Methods: Association rule is a classical algorithm of data mining. It has a strong ability to deal with the incomplete data, and may discover those patterns which are unknown and novel to researchers. It provides than reference for the researchers to understand completely and further analyze the data. Using the association rule algorithm, we can reduce the influence of missing value, and find those latent influencing factors and its united effect to GERD. We also know which the explanatory variables should be selected when modeling the multilevel models.Multilevel model is a multianalysis method which was used to deal with multilevel data and had been widely applied in many domains. Multilevel data has some between-group homogeneity, namely has some within-group homogeneity, and so does not conform to the hypothesis of each observation value independent one another in some traditional statistical methods, such as multiple linear regression. However, multilevel model can avoid the limits of these traditional statistical methods, and reduce the estimation bias. And so, based on the results of association rules analysis, we used multilevel model to take the hierarchically structured characteristics of the data into account, and ulteriorly reveal the influencing factors of GERD.Results: The study summarized the basic theory and main algorithms of association rules, and as well interesting measure methods of rules. And then, we used the classical Apriori algorithm in SAS/EM to carry on the association rules. After the rules produced, we carried on rule preliminary screening using the template matching method firstly, and then eliminated the redundant multinomial rules by setting up the increasing multiple of confidence as 0.05. And finally, based on the correlation analysis results of these objective measure indexes, we carried on selecting interestingness measures using the following six indexes: lift value, PS value, Interest value, Fitness function, contingency coefficient and Fisher's exact probability.According to these selected interesting rules, we found that some basic information (such as survey sites, region, sex, age, smoking status, drinking status, marital status, family month income, occupation, health status, education level, spirit status, physical activity or exercise and family history of gastrointestinal (GI) tumors or other GI diseases), some anamneses (for example dyspepsia, gastritis, rheumatic arthritis, chronic faucitis and history of abdominal surgery), and as well some diseases diagnosed by RomeⅡ(such as irritable bowel syndrome, aerophagia and unspecific functional bowel disorder) were associated with GERD symptoms. In conclusion, by association rule, we not only had the preliminary understanding to the influencing factors of GERD, but also knew which explanatory variables should be considered during the multilevel model analysis.And then we summarized the basic theory,the modelling step of multilevel model, and the residual bootstrap. And then, we determined to take the sub-district (town) as the level-2 units and the resident as the level-1 unit, and did two levels multilevel model fitting. We passed through the following five steps to establish the final model: fitting empty model -> taking the level-2 explanatory variables"survey site"into the empty model -> taking the level-1 explanatory variables into the above model using forward selection method -> testing the random slope of level-1 explanatory variable -> testing the cross-level interactions". In addition, the level-2 group numbers were relatively few and the level 1 residual didn't obey the normal distribution, which didn't conform the hypothesis of the maximum likelihood method. Therefore, based on the above final model, we use the nonparametric residual bootstrap and the parametric residual bootstrap to do the model fitting to reduce estimating bias. The findings indicated that the results of the nonparametric method and the parametric method were quite close to that of the original sample. However, the standard errors of the parametric method were slightly higher than that of the nonparametric method; especially the standard error of the level-1 residual varianceσ? 2 was bigger than that of the nonparametric method and the original sample. The results were in consitent with our anticipation. The reason is, the nonparametric method considered that the level-1 residual e ij of the original sample didn't obey the normal distribution, but the parametric method hypothesized that the level-1 residual e ijobeyed the normal distribution.In conclusion, the results of multilevel model analysis showed that the GERD score of resident in Wuhan was relatively higher and had significant difference with that in Guangzhou, however, that in Shanghai, Beijing and Xi'an had no significant difference compared with that in Guangzhou. That is to say, the prevalence of GERD in Wuhan was relatively higher compared with the other four survey sites; the residents'health status more worse, the GERD score more higher, and the influence of health status on GERD score was also effected by the region difference between Wuhan and Guangzhou; residents lived in rural region with lower education level, worse spirit status were more likely to suffer from GERD; Moreover, those residents who had family history of GI tumors or other GI diseases, some anamneses (such as dyspepsia, gastritis, rheumatic arthritis), and some diseases (such as irritable bowel syndrome and aerophagia) were also more likely to report GERD. However, no significant association was found between GERD and some other factors in this survey, such as sex, age, etc. The results of the nonparameter method and the parameter method were quite close to that of the original sample, except that no significant difference was found in the GERD score between the urban and rural residents. Conclusions:Using standardized, well-validated international questionnaires, we carried on the biggest scale of GERD epidemiology survey in mainland China and studied the influencing factors of GERD synthetically.The results indicated that the prevalence of GERD in Wuhan was highest among these five sities; those residents lived in rural region with lower education level, family history of GI tumors or other GI diseases, worse health status and spirit status were more likely to suffer from GERD; Moreover, those residents who had some anamneses (such as dyspepsia, gastritis, rheumatic arthritis), and some diseases (such as irritable bowel syndrome and aerophagia) were also more likely to report GERD. However, there were also some possible association between GERD and some other factors and comorbidities, such as sex, age, smoking status, drinking status, family income, marry status, occupation, frequency of physical activity or exercise, chronic faucitis, unspecific functional bowel disorder and history of abdominal surgery, etc.In conclusion, we discussed the united application of multilevel model and association rule algorithm in the analysis of survey data. The study methods avoided the limits of traditional statistical methods and revealed the influencing factors of GERD scientifically, which helped to enhance people's cognition degree to GERD and provide some available information for preventing and controlling GERD. Moreover, the study methods adopted in this research also had some reference value to other epidemiology survey.
Keywords/Search Tags:gastroesophageal reflux disease, association rules, apriori algorithm, multilevel models, residual bootstrap
PDF Full Text Request
Related items