| With the development of the network,more and more people surf on the Internet. Howto save the data has restricted the industries’development.So it has been a hot area to builda high available storage network. Under the situation,the DFS(Distributed File System)appears. HDFS is more suitable for the application scenario of massive data.But thearchitecture of HDFS also has shortcomings.So this paper put forward a scheme to optimizethe HDFS and use it into the reality. This paper’s main work is as follows:Firstly,I put forward a scheme to solve the bottleneck of the single namenode.The corecontent of the scheme is building clusters of the datanodes named datanode-cluster.Eachcluster has a namenode to manage the datanodes.At the same time,the scheme makes thefile metadata cached to datanode-cluster.And it will reduce the memory pressure ofnamenode. In the improved HDFS’s architechture,even one namenode stops work,theHDFS will keep working,because it has so much namenodes.I also build a index service tomanage the mapping between files and namenodes.Secondly,after building the improved HDFS,I use it into the reality-theintegrated information management platform for university based on cloudcomputing.There are many applications in the platform.To overcome the storage’smanagement of the applications,I used Java developing a service for HDFS storage anddevelop a Java Remote Method Invocation(RMI) service for applications which is based onthe HDFS cluster.I used ZSSH(ZK+Spring+Struts+Hibernate) architechture and Javadevelop a File Management System(FMS) for the platform.Finally, I use JMeter which is a testing tool to test the HDFS’s property and make acomparation between the old HDFS and the improved HDFS.The testing content is makinga compare between the unimproved HDFS and improved HDFS through observing thefiles’ uploading response time in the same parallel situation.The result of the researchshows that the response time of the improved HDFS is less. |