Font Size: a A A

Data Mining Problems In Automatic Computer Diagnosis

Posted on:2007-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:N LaoFull Text:PDF
GTID:2178360212485367Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the function and complexity of computer systems keep soaring, more conflictions and unexpected results come along. These problems become a serious challenge to both users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and the results inaccurate even for solved problems, which contribute significantly to user's dissatisfaction. Therefore, scientists from both system domain and data mining domain start to explore solving system management problems with data mining methods. The new conception of Recovery Oriented Computing (ROC) is increasingly becoming a hot research area. One of its important goals is to enable automatic identification of the root cause of a problem if it is a known one, which would further lead to its resolution. This thesis explored using various types of static and dynamic system information to build correlations with solved problems.State-of-the-art diagnose techniques on personal computers can be roughly classified into two types: one, using high level natural language problem symptom description and Information Retrieval (IR) system to help system diagnosis; two, using low level system static information to achieve automatic diagnosis. However, the first approach still involves a lot of human intervention, and the second approach has unsatisfactory accuracy in many cases. This thesis takes the lead in combining both high level problem symptom description and low level system static information to achieve automatic diagnosis on personal computers. Its good accuracy is proved in experiment on many real life problems. This thesis also brings forward four probabilistic models which combine these two types of information to help retrieving troubleshooting documents from knowledge databases. Experiment results strongly support their effectiveness in improving retrieval accuracy. This work becomes the first in IR domain to build comprehensive models for Contextual Retrieval and prove its usefulness in practical problem.Compared to system static information, dynamic information has many merits like easier for collection, wider application, and less noise information. However, because of its analysis complexity, it has not been used in diagnose techniques on personal computers. Most of its related works are concentrated on analyzing complex systems like computer clusters and networks. With the help of various data mining methods like clustering, classification, and association rule mining, dynamic information is used to facilitate hard problem diagnosis, but cannot achieve automated diagnosis. This thesis proposes a two level classifier to achieve fully automated diagnosis. The first level consists of automated symptom detection. The second level consists of classifiers built on system call sequences of know problem. In our experiment, the method achieves good accuracy in five common problems on Windows system. In order to solve the sequence classification problem, a new feature extraction method based on string alignment is proposed. Combined with Support Vector Machine (SVM) classifier, this method yields much better result than traditional n-gram sequence feature extraction, and also better than canonical sequence classification method like Hidden Markov Model (HMM).
Keywords/Search Tags:automatic diagnosis, Windows registry, contextual retrieval, ranking algorithm, string alignment
PDF Full Text Request
Related items