| Now is the era of big data. Faced with the large mounts of increasing and diversified data,the traditional data storage technology has been unable to meet the demand of such big datastorage. With the advent of Hadoop distributed file system, the problem of big data storage hasbeen solved. Because Hadoop distributed file system HDFS(Hadoop Distributed File System)uses One-Master Multi-Slaves architecture, it has NameNode single point of failure problem;and massive small files storage will reduce the storage performance of NameNode seriously,meanwhile it causes the memory bottleneck problem of NameNode. Based on research onperformance optimization of NameNode, it is of great exploration value and practicalsignificance to solve the big data processing and storage problem.The paper makes a deep analysis and research on performance optimization ofNameNode. To solve the question that single node invalidation of NameNode, this paper usesthe MN-BH distributed file system structure, and further optimizes the original cloud storageplatform. If the NameNode server goes down, another standby NameNode server can bestarted timely, ensure the normal service of the Hadoop cluster. In order to improve the storageperformance of NameNode, to solve the question that the single point memory bottlenecks,this paper proposes small files storage optimization algorithm based on HSFM. In processlayer, uploaded files are processed, a huge number of small files are merged into one big file,and then it is stored persistently in each DataNode node, the single point memory bottlenecksproblem caused by small files can be solved. The algorithm can reduce the memory burden ofthe NameNode server effectively, improves the read and write performance of NameNodegreatly.After analyzing performance optimization of NameNode, this paper gives the detaileddesign and implementation. Finally, test the optimized Hadoop distributed file system,simulate a failure on the master server and make a switch to the standby NameNode server, nofiles are lost in the HDFS, ensure that the whole Hadoop server cluster works accurately andcredibly, the test has achieved the expected effect. In order to test the optimized performanceof NameNode, three sets of experiments are designed. These are the NameNode memoryfootprint test, small files storage performance test, small files read performance test. Theexperimental results show that the optimum design can greatly reduce the NameNode memory footprint. The read and write speed is three times faster than its former. Through analysis ofthe experimental data, the results has achieved the desired test effect. |