| Hadoop as a distributed computing framework,is one of the main tools of network log analysis.The Hadoop system has proposed a number of shortened processing time solutions from I / O performance,task scheduling,MapReduce model,but they can not meet the changing demand in the process of network log analysis.In the Internet companies and research institutions in the limited computer resources,and network log analysis needs are growing and changing,Hadoop network log analysis system is often due to increased computing speed and slow down or even can not run.This paper presents four performance optimization schemes for the Hadoop network log analysis system by combining the content characteristics of the weblog,the analysis method,and the calculation framework of Hadoop.The merge operations share the I /O and combine the similar operations to save more Times to read the time of the data set;small job group data prefetching to save the time to read additional data sets;Reduce load balancing,shorten the processing time of Reduce stage;multiple complex modules jointly tune,the first three programs To coordinate multiple stages of accumulated savings processing time.These four optimization schemes save time in some stages,but also bring additional processing time elsewhere.Theoretical calculations show that these four schemes are performance improvement targets that can achieve overall processing time under agreed conditions.The experimental results show that the proposed strategy can effectively improve the performance of Hadoop in network log analysis.Data distribution and dynamic changes will make the performance increase in the range of 20% to 5 times the range of fluctuations,optimization and similar optimization algorithm or framework comparison has improved. |