A comparative study of data mining and statistical learning techniques for prediction of cancer survivability

Posted on:2013-06-29

Degree:Ph.D

Type:Dissertation

University:Capella University

Candidate:Edeki, Charles A

Full Text:PDF

GTID:1458390008963710

Subject:Biology

Abstract/Summary:

Huge efforts are being made by computer scientists and statisticians to design and implement algorithms and techniques for efficient storage, management, processing, and analysis of biological database. Data mining is an emerging area of computational intelligence that offers new theories, techniques and tools for processing large volumes of data (Sriraam, Natasha & Kaur, Data mining approaches for kidney dialysis treatment, 2006). The data mining and statistical learning techniques were used to discover consistent and useful patterns in large datasets. These techniques are used in a computational biology and bioinformatics fields. Computational biology and bioinformatics seeks to solve biological problems by combining aspects of biology, computer science, mathematics, and other disciplines (Adams, Matheson & Pruim, BLASTED: Integrating biology and computation, 2008). The main focus of this study is to expand understanding of how biologists, medical practitioners and scientists would benefit from data mining and statistical learning techniques in prediction of breast cancer survivability and prognosis using R statistical computing tool and Weka machine learning tool. In this dissertation, data mining and statistical learning techniques were applied to breast cancer datasets for survival analysis. The breast cancer dataset from University of California, Irvine (UCI) machine learning database system and National Cancer Institute (NCI) biological database system were used for prediction and comparative study of the data mining and statistical learning techniques. The results of the classifiers or models were mixed, logistic regression did outperform decision tree, SVM, AdaBoost, Bagging and naive Bayes algorithms based on accuracy. However, the artificial neural network showed slight improvement over logistic regression, and the decision tree resulted in slightly higher classification accuracy over AdaBoost, Bagging and naive Bayes' models in terms of accuracy. Results were mixed as to which algorithm is the most optimal model, and it appeared that the performance of each algorithm depends on the size, high dimensionality of data representation and cleanliness of the dataset.

Keywords/Search Tags:

Data, Statistical learning techniques, Cancer, Prediction

Related items

1	Response Prediction For Cancer Treatment Based On Deep Learning
2	Establishment Of Prediction System Of Lung Cancer By Data Mining Technique Based On Plasma Micro Rnas
3	Development of Statistical Learning Techniques for INS and GPS Data Fusion
4	Tactical terrorism analysis: A comparative study of statistical learning techniques to predict culpability for terrorist bombings in two regional low-intensity conflicts
5	The Establishment Of China Cancer Database For Prevention And Control
6	Research On Methods Of Learning Statistical Relational Model
7	Statistical analysis and modeling: cancer, clinical trials, environment and epidemiology
8	Quantification of structural information in atom probe tomography using statistical learning techniques
9	Model Construction, Regional Division And Optimization Regulation Based On Statistical Learnin
10	Process control utilizing data-based models: Applications of statistical techniques and neural networks