| In recent years, big data is one of the most popular IT technology, the amount of data have now reached the new heights in every field. Data is first reached saturation in IT industry. Then IT industry becomes the most advanced in the field of big data technology, the major Internet companies and database vendors have developed or launched their own big data product in order to solve the daily operation problems that they encountered.This paper describes the IPTVQos log analysis project. We need to find a suitable solution to fulfill the demand of data mining and proposed clustering problem of the customers. To find the right solution, the paper first describes the background of the project. Then it introduced early progress of the project and the coping measures. Then it point out the current project’s features expected:1, abnormal data removing; 2, find the similarity of abnormal records in massive data. In response to questions, the paper lists a number of big data solutions. Then it compare them, and ultimately choose Hadoop as the big data platform to complete the data analysis, the paper then describes the various components of Hadoop framework characteristics of each part, the use of methods and usage sitiations.In the next section, the paper lists the technical difficulties encountered during the development process.1, scan the large amount of data.2, the calculation of the big data cost too much time.3, dependencies exists between attributes.4, lack of the attribute to be sort.5, the intelligence analysis of ranking results.6, remove the abnormal data. To analyze the various technical difficulties, auther proposed a series of solutions and analyze the technical feasibility of the solution as well as time and space complexities. Then, based on the theory proposed the auther proves it by experiment. In the final section, the paper lists the key code used and explained it. |