| Meta-data is described as data about data,it is mainly used to describe attribute about data,such as the place where they stored,history data,resource location,file content.The style used to describe is free which means a lot of method we can choose,at the beginning period of meta-data people use field designations (such as MARC) or database record(such as ROADS), As form of meta-data has increased and the requirement of interoperate, people began to use some standardized DDL to describe metadata,such as SGML and XML, Among them the most promised is XML. This paper describes the metadata management system HMS uses XML format to describe metadata.With the development of databases and networks, XML has become a standard for data interchange and representation on the World Wide Web. Major applications include data exchange, Web services, content management, and Web integration. XML standard was defined by the W3C XML Working Group. XML has a semi-structured nature, which makes it different from the relational database queries in SQL queries. However it has its own variety of search criteria and methods, in which the most representative XQuery and XPath. XML is self-describing, which is suitable for describing metadata XML is an important feature.From the perspective of system and data management, there exits some problems in centralized mamagement of meta-data,such as system bottleneck,single point of failure, fault-tolerance and hard to scale up.Thanks to research and development in the field of cloud computing,many high available distributed platforms and infrastructure come forth. HMS is a meta-data management system on top of HBase/Hadoop,aimed to provide a meta-data management service on distributed system,at the same time guarantee effectiveness and high availability. HMS system not only provides basic CRUD interface of a meta-data management system,but also sustains entity query as an extended function.This article will explain metadata management system HMS architecture, storage and query module implementation.(1) From the overall, HMS persistent storage use a NoSQL database HBase, above the persistent storage it use Thrift gateway interface to access the underlying HBase. On this basis there are metadata parsing and encoding module, query transforming module and query processing module, Web UI interface.(2) Storage module mainly to store metadata in HBase, which contains the XML file parsing, XML elements encoding and HBase Rowkey selection and table structure design.(3) Query processing module consists mainly two parts, the first is query parsing, after determine the scope of query syntax of query string, it can parse a query into a twig pattern which has only one node tagged as result. The second is the query algorithm, the main idea is to use the TJFast algorithms. And then describe the necessity of merge-join. |