| With the rapid development of computer networks and its applications, there are more and more platforms and applications on the network, so the user may use the virtual identity in a large number of different platforms and applications. Whether static data such as registration account or the informations of user interactions such as recorded message, the levels of the amount of data they store reached the TB even PB. In the Web2.0, Internet applications need to handle a large number of data that users create or share, such as pictures, videos, blog posts and so on, whose types are varied, formats and sizes are not exactly the same. Large amounts, various types and different sizes are severe test to mass data storage and management.This article is based on the project *** virtual identity management, which is sub-topic of the 863 major projects- ***the technology of network identity management and application. Its main feature is to manage virtual identities in a variety of different platforms uniformly and provide interfaces for the applicationa and platforms in the network to make the actions, such as find and traceability easy. The paper is to research the technology of data storage capacity, mainly solve and implementation the data model of storage, data partition and copy in a distributed environment, multidimensional query index and cache which is used to improve the efficiency of query. At last it is run in the virtual identity traceability system. This paper’s work mainly include:(1) In terms of storage, to the virtual identity’s characteristic that are large amount of data and the fuzzy query, this paper presents a data model based on the combination of MySQL database and Cassandra database. In a distributed environment, considering the data partitioning and data backup, data partition method based on weighted and improved hash algorithm and data replication based on the consistency and change of scale, hot combination are designed and implementation.(2) In the query, according to the virtual identity request in the non specifiedqueries and fast, accurate localization of machine node, the combination of Cassandra index and inverted index, node index are designed and implementation. The local theory considering the request access, design and implementation of semantic cache technology for virtual identitycharacteristics(3) In the aspect of system realization, data model, data partitioning and data replication strategy in storage and the multi dimension index query and semantic caching in query are tested based on virtual traceability system, which is proved that the methods mentioned above have good performance in improving the system efficiency.. |