Research And Implementation Of Some Main Techniques In Data Preproceesing System

Posted on:2013-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:F W Bai

Full Text:PDF

GTID:2248330371977796

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of database technology and information technology, a large number of data for transaction management and data analysis have been accumulated among enterprises. How to effectively use these data becomes the greatest concern of enterprises. In the need of extracting a small amount but extremely valuable information from the large number of data, data mining came into being. However, as the result of the unique architectural design of each database, data collection errors, random errors by data input, inadequate maintenance and so on, there are some problems in these data inevitably. In addition, the sharp increase in the amount of data brings great difficulties to data mining tasks. These problems mentioned above largely affect the success of data mining tasks. Therefore, it is necessary to improve the quality of data before carrying out data mining tasks, namely data preprocessing.This paper first introduced the basic knowledge and main tasks of data preprocessing. Followed by the detailed introduction of the data preprocessing system, including the part the system has achieved and the part this paper achieved. Then a detailed description of data described in XML format, this paper puts forward one kind of data format based on XML schema definition and a batch processing method for handling large amounts of data collection and analyzes the XML parsing methods. Then the similarity measure algorithms are described and compared, this paper pointed out the problems and improving methods within these algorithms, and put forward a distance measure algorithm concerned the data distribution and an improved consine similarity measure algorithm. Finally, this paper carried out analysis of discretization algorithms and proposed a discretization algorithm based on similarity measure.

Keywords/Search Tags:

Data Mining, Data Preprocessing, Preprocessing System, XML Format, Similarity Measure, Disretization

PDF Full Text Request

Related items

1	Design And Implementation Of Data Preprocessing System Oriented To Data Mining
2	Research And Application On Data Preprocessing System Of Mobile Internet Data
3	Based On The Web Server Log Mining Data Preprocessing Technology Research
4	Data Quality Control: Research, Design, And Implementation In Data Preprocessing
5	Research And Evaluation System Of Data Preprocessing System Design And Implementation,
6	Research And Application On Data Preprocessing Algorithms
7	Web Log Mining Research And Data Preprocessing Algorithm
8	Research And Application Of Internet Web Log Preprocessing
9	Research And Application Of Data Mining Algorithms Based On Data Preprocessing And Regression Analysis Techniques
10	Data Collection And Preprocessing For Multi-Website Web Log Mining