Data Mining Problems In Automatic Computer Diagnosis

Posted on:2007-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:N Lao

Full Text:PDF

GTID:2178360212485367

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the function and complexity of computer systems keep soaring, more conflictions and unexpected results come along. These problems become a serious challenge to both users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and the results inaccurate even for solved problems, which contribute significantly to user's dissatisfaction. Therefore, scientists from both system domain and data mining domain start to explore solving system management problems with data mining methods. The new conception of Recovery Oriented Computing (ROC) is increasingly becoming a hot research area. One of its important goals is to enable automatic identification of the root cause of a problem if it is a known one, which would further lead to its resolution. This thesis explored using various types of static and dynamic system information to build correlations with solved problems.State-of-the-art diagnose techniques on personal computers can be roughly classified into two types: one, using high level natural language problem symptom description and Information Retrieval (IR) system to help system diagnosis; two, using low level system static information to achieve automatic diagnosis. However, the first approach still involves a lot of human intervention, and the second approach has unsatisfactory accuracy in many cases. This thesis takes the lead in combining both high level problem symptom description and low level system static information to achieve automatic diagnosis on personal computers. Its good accuracy is proved in experiment on many real life problems. This thesis also brings forward four probabilistic models which combine these two types of information to help retrieving troubleshooting documents from knowledge databases. Experiment results strongly support their effectiveness in improving retrieval accuracy. This work becomes the first in IR domain to build comprehensive models for Contextual Retrieval and prove its usefulness in practical problem.Compared to system static information, dynamic information has many merits like easier for collection, wider application, and less noise information. However, because of its analysis complexity, it has not been used in diagnose techniques on personal computers. Most of its related works are concentrated on analyzing complex systems like computer clusters and networks. With the help of various data mining methods like clustering, classification, and association rule mining, dynamic information is used to facilitate hard problem diagnosis, but cannot achieve automated diagnosis. This thesis proposes a two level classifier to achieve fully automated diagnosis. The first level consists of automated symptom detection. The second level consists of classifiers built on system call sequences of know problem. In our experiment, the method achieves good accuracy in five common problems on Windows system. In order to solve the sequence classification problem, a new feature extraction method based on string alignment is proposed. Combined with Support Vector Machine (SVM) classifier, this method yields much better result than traditional n-gram sequence feature extraction, and also better than canonical sequence classification method like Hidden Markov Model (HMM).

Keywords/Search Tags:

automatic diagnosis, Windows registry, contextual retrieval, ranking algorithm, string alignment

PDF Full Text Request

Related items

1	The Research And Implementation Of Automatic Scoring System Of VB Programming Problem Based On Windows API
2	Research On String Retrieval Algorithm Based On Trie Tree
3	Research On The Method Of Alignment Between Face And Person Names
4	Optimization Of String Matching Algorithm Based On Computer Architecture
5	Spatial Contextual Information Based Image Retrieval
6	Stem Extraction And Related Ranking Optimization For Lightweight Retrieval Services
7	Task Oriented Tools for Information Retrieval
8	Research On Windows Registry Forensic Analysis Technology
9	Research On Q-gram Filters For Approximate String Matching
10	Image Tag Ranking Based On Sparse Coding Algorithm