Font Size: a A A

Data Mining Of Hospital Information And Exploration Of Its Practical Implementation

Posted on:2008-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YiFull Text:PDF
GTID:1104360218959078Subject:Clinical Laboratory Science
Abstract/Summary:PDF Full Text Request
Objective It is worth establishing practical, simple-operated data mining software of hospital information based on SPSS Clementine via internet, with the integrated hospital information system, And then discussing the application of data mining on variable forecast, disease diagnosis and association rule of disease, and studying in the methodology of data mining that analyzing the prevalence status of tuberculosis and its trend in the future, the risk factors of the axillary III lymph nodes metastasis of breast cancer and its classification model, the association rule of diabetes and diabetic complication, using the optimum arithmetic of data mining. The online data mining of hospital information system not only can save money and share resources, but also can provide efficient tool of comprehensive analysis and making decision for clinical manager, doctor, nurse and other technician to administer scientifically, enhance the accuracy of diagnosis the effect of treatment, and make medical research. As the methodology of data mining stands, it's the key-step of the exact obtained-knowledge that taking the optimum arithmetic of data mining scientifically. With the development of computer technology and biomedical engineering research, and the widely application of computer information technology in medicine field, a great lot of exact medical records were stored which contain a lot of important knowledge. It becomes more and more importance that mining the hidden, deep-seated, valuable knowledge from the lots of medical records, because it's impendent solution on the'Knowledge Discover'in the medical information field which can improve the manage level of hospital and advance the medical service quality. Up till the present moment, there have been some publications on the application of data mining in the medical service via internet in America, no in China, according to different practical data mining for different object, taking the optimum arithmetic of data mining scientifically has not been done in the study which existed.Method and Data Using Java network programming language and implementing of online data mining of hospital information system based on SPSS Clementine. Using Autoregressive Integrated Moving Average model (ARIMA), Back-Propagation Artificial Neural Network model (BPANN), Grey model (1, 1) (GM (1, 1)) to forecast the prevalence of tuberculosis and compare the accuracy of the three arithmetic, based on the data from the Anti-tuberculosis Institute of Chongqing. Using Logistic model (Logistic), CHAID model (CHAID), Radial Basis Functions Network model (RBFN), the combination model of the RBFN and the Logistic, the combination model of the RBFN and the CHAID to classify the status of axillary III lymph nodes of breast cancer and compare the accuracy and reliability of the five arithmetic, based on the data from the First Affiliated Hospital of Chongqing University of Medical Sciences. Using Apriori model to describe the association rule between diabetes and diabetic complication, based on the data from the Second Affiliated Hospital of Chongqing University of Medical Sciences.Studied①Using Java network programming language and explorating the implementation of online data mining of hospital information system based on SPSS Clementine.②analyzing the prevalence status of tuberculosis in Chongqing, the risk factors of the axillary III lymph nodes metastasis of breast cancer and the association rule between diabetes and diabetic complication.③Utilizing three arithmetic of data mining: ARIMA, BPANN, GM (1, 1) to predict the prevalence of tuberculosis and compare the accuracy of them.④Making a combination model through combining the RBFN and the Logistic, and combining the RBFN and the CHAID.⑤Utilizing the Logistic, CHAID, RBFN, the combination model of the BFN and the Logistic, and the combination model of the RBFN and the CHAID to classify the status of axillary's III lymph nodes of breast cancer and to compare the accuracy and reliability with five arithmetic.Results①preliminary Setted up the data mining software of hospital information system via internet based on SPSS Clementine,implemented the data collecting, engine executing, result storing, and searching the result.②The prevalence of tuberculosis clearly show a seasonal moving regular, which manifests a wave phenomenon the whole year, in the first and third season the incidence goes down, while it increases in the other two seasons basically. There are correlation between a season of this year and six seasons of the year before last year. The predictive results will be right when consider the seasonal factor and circle random factors of tuberculosis.③The average relative error of predictive model of ARIMA, BPANN2, and GM (1, 1) are 0.05872, 0.06999, and 0.01210, respectively, means the GM (1, 1) is perfect for predicting the prevalence of tuberculosis.④There are significant correlation between the status of axillary III lymph nodes of the breast cancer and the status of axillary I and II lymph nodes, and the size of tumor.⑤Some expression of diagnostic knowledge are difficult to understand for user, for example, the expression of diagnostic knowledge of the RBFN is weight matrix, and the Logistic and the combination of RBFN and Logistic are logistic regression coefficient. But the expression of diagnostic knowledge of the CHAID and the combination of RBFN and CHAID are the tree plot using natural language which easy to understand.⑥The average predictive accuracy of the Logistic, the CHAID, the RBFN, the combination of RBFN and Logistic, and the combination of RBFN and CHAID are 83.34%, 83.79%, 85.61%, 83.77%, and 79.74%, respectively. And the absolute values of the reliabilities minus 1 of them are 0.0720, 0.0625, 0.0549, 0.0766, and 0.0948, respectively. The accuracy and reliability of the RBFN is higher than other arithmetic in the five methods, means that the RBFN is the best arithmetic for classifying the status of axillary III lymph nodes of breast cancer.⑦The influence order of the diagnostic indexes can be found from the diagnostic knowledge of the CHAID, which is described by a chart of tree, the status of axillary I and II lymph nodes, and the size of tumor are very important for classifying the status of axillary III lymph nodes of breast cancer. The CHAID is a simple, practical diagnostic method based on the computer which can automatically pick up diagnostic knowledge from records. So it can be widely applied on breast cancer and other diseases research.⑧There are eight diseases such as infected-urinary, diabetic nephropathy, diabetic ophthalmia, diabetic neuropathy, hyperlipemia, hypertension, diabetic cardiopathy, coronary heart disease, which are significant relative to diabetes.Conclusions①had preliminary implemented the online data mining of hospital data based on SPSS Clementine, which is very important part of the hospital information system. It will enhance the use of computer information technology, which will improve the manage level of hospital, advance the medical service quality, reduce the hospital operation price, when the hospital information system combined with the data mining.②Making clear and confirming that the GM (1, 1) is perfect for predicting the prevalence of tuberculosis. The RBFN and the CHAID is the best two kind of arithmetic for classifying the status of axillary III lymph node of breast cancer, which is the result of analyzing the expression of diagnosis knowledge and the accuracy and reliability of the five arithmetic methods. As an assistant tool, Apriori can make doctor to research the real correlation between the diabetes and infected-urinary which seldom reported in the medical journal.
Keywords/Search Tags:hospital information system, data mining, model, data mining software
PDF Full Text Request
Related items