Research On HDFS Optimization And Its Application In Cloud Storage Platform

Posted on:2015-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:W F Wang

Full Text:PDF

GTID:2298330452994403

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the networkï¼Œmore and more people surf on the Internet. Howto save the data has restricted the industriesâ€™development.So it has been a hot area to builda high available storage network. Under the situationï¼Œthe DFS(Distributed File System)appears. HDFS is more suitable for the application scenario of massive data.But thearchitecture of HDFS also has shortcomings.So this paper put forward a scheme to optimizethe HDFS and use it into the reality. This paperâ€™s main work is as follows:Firstly,I put forward a scheme to solve the bottleneck of the single namenode.The corecontent of the scheme is building clusters of the datanodes named datanode-cluster.Eachcluster has a namenode to manage the datanodes.At the same time,the scheme makes thefile metadata cached to datanode-cluster.And it will reduce the memory pressure ofnamenode. In the improved HDFSâ€™s architechture,even one namenode stops work,theHDFS will keep working,because it has so much namenodes.I also build a index service tomanage the mapping between files and namenodes.Secondly,after building the improved HDFS,I use it into the reality-theintegrated information management platform for university based on cloudcomputing.There are many applications in the platform.To overcome the storageâ€™smanagement of the applications,I used Java developing a service for HDFS storage anddevelop a Java Remote Method Invocation(RMI) service for applications which is based onthe HDFS cluster.I used ZSSH(ZK+Spring+Struts+Hibernate) architechture and Javadevelop a File Management System(FMS) for the platform.Finally, I use JMeter which is a testing tool to test the HDFSâ€™s property and make acomparation between the old HDFS and the improved HDFS.The testing content is makinga compare between the unimproved HDFS and improved HDFS through observing thefilesâ€™ uploading response time in the same parallel situation.The result of the researchshows that the response time of the improved HDFS is less.

Keywords/Search Tags:

DFS, HDFS, namenode bottleneck, FMS

PDF Full Text Request

Related items

1	Research On Performance Optimization Technology Of Namenode Based On HDFS
2	Optimization Of A Network Retrograde Analysis System Implemented On HDFS
3	Research And Optimization On Distributed Storage Based On HDFS
4	Research On The Metadata Management Of Multi Namenodes Based On HDFS
5	Research And Design Of High Resilience Solution In HDFS
6	Research And Optimization Of Storage Mechanism In Hadoop Distributed File System
7	The Research On IP Network End-to-End Performance Bottleneck Based On Active Measurement
8	Design And Implementation Of Big Data Storage System Based On Hadoop
9	Research And Application Of Distributed Storage System Based On Cloud Computing
10	The Research Of Data Security Of The Cloud Storage System Based On Hadoop Distributed File System