Font Size: a A A

Data Mining Applications And Research In The Classification Of Hypothyroidism

Posted on:2011-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:T LongFull Text:PDF
GTID:2204360305959485Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of medical information and the increment of diagnostic data, it is necessary to extract the potential and significant knowledge using the deep analysis of data mining technology.The current research based on hypothyroidism classification mining is not good enough to determine the advantages and disadvantages of classification models,because it comes from the perspective of medical analysis, statistical theory, or the single data mining model, not combing with statistical method and data mining, and failing to compare and analyze the variety of data mining models comprehensively.In this paper, researches the different datas of hypothyrodisim from the statistical methods and practical application, and compares with different classification models to make up the current deficiency. Makes a comprehensive analysis of the performance of three models from the variable demands, data robustness, time cosuming, result interpretation, classification accuracy, performance scalability and many other factors, also provids a referencing and guiding significance to the clinical diagnosis of hypothyroidism.This paper contains the following aspects:1) Introduces the concepts of data mining technology and major applications, analyzes the CRISP-DM data mining process in the various stages of implementation, and the corresponding results deeply. Takes a more deep business understanding of hypothyroidism classification combing with research and application. At the same time, conducts in-depth exploration of hypothyroidism properties in the data understand process, so that making the training set and testing set more general and representative. Analyzes and pre-processes the fields relevant with missing values, outliers, useless or redundant attributes in the data preparation process.2) Researches the main method, mathematical principle and application of the discriminant analysis, Logistic regression and CHAID decision tree, explores the similarities, differences and mutual relations of the statistical methods and data mining, based on the statistical theory and data models. Determines the main indicators of hypothyroidism classification through the establishment of appropriate data mining model. Makes a more comprehensive statistical analysis of data mining algorithms and research of the mining models to find optimum with a variety of statistical methods and principles of data mining model combination, carries out a further measurement and comprehensive analysis of three models from different factors.3) Uses the CRISP-DM data mining standard process for systematic hypothyroidism research and development to grasp the six stages of the implementation process from the whole and detail views in Clementine12.0 development environment. Takes the data mining work in a structured, systematic, standard, and visual process. Uses the Script language to develop the whole process of data mining to improve those manual, repetitive, time consuming tasks, and also help to achieve the automatic process and batching process in the user interface.
Keywords/Search Tags:hypothyroidism data mining, CRIPS-DM, discriminant analysis, Logistic regression, CHAID tree
PDF Full Text Request
Related items