| In recent years, wireless sensor networks are being used more and more widely in fields such as military, intelligence building, wildlife protection and research. It is irreplaceable in collecting massive amount of data. Wireless sensor networks are constituted by a great amount of low price and power consumption sensors. In wireless sensor networks, the scheme of data storage and routing scheme is used to provide data storage and query, which is a mandatory part. The requirements of energy consumption and querying time cost have posed high demand for efficient storage schemes. Wireless sensor network is the combination of microelectronics, embedding computing, modern network topology, wireless communication and distributed information processing. It is able to collaborate to monitor, sense and collect the information of the surrounding environment and objects within its sensing range, and processing these information. The post-processed information will be transmitted to the sink via wireless connections. Currently there are three categories of storage schemes in wireless sensor networks, which are centralized storage, local storage and distributed storage. And the mainstream is the latter two categories.In centralized storage, all data collected by the sensors are stored in the sink outside the network. All query and storage requests need to be routed to the sink. When the frequency of request becomes high, sink becomes the bottleneck. Therefore, this kind of schemes is seldom used nowadays, and is constrained to low speed data transmission.In the realm of local storage, the recently proposed methods include Directed Diffusion, TAG, GRAB, TTDD, etc. In those schemes, every sensor stores the sensed data within itself, so the cost of storage is minimized. However, querying the data needs to flood the request throughout the network, which incurs huge communication cost. In this category, Directed diffusion is based on subscribe-publish scheme. When a node wants to be notified by certain events, it will flood its interest to the network, and the node with the interested information will return the information to the subscriber. ARI proposed an index based data dissemination scheme, in which the sensed data is stored in the source node, but the coordinates of the data are sent to several indexers. GRAB sends data constantly from the sender to the receiver. It also let the sensor control the trustworthiness of the information sent. TAG provides SQL like query elements, and provides operators in order to aggregate the querying results. TTDD utilizes two-tier structure to reduce the communication cost when nodes are queried. Distributed storage is the main direction of recent protocols. Their similarity is that they are based on GPSR routing algorithm. They obtain the coordinate by hashing the data, and store data to the node that is the nearest to the hashed coordinate. DIFS and DIM uses distributed multiple-layer index. Dimensions provides multi-dimension data indexing and provides high query precision at low cost. GLS enables the coordinate servers communication periodically to obtain updated coordinates. KDDCS proposed to utilize K-D tree to realize balanced data storage and avoid bottlenecks.However, these approaches have not considered the optimization of multiple complex querying. For example, the queries based space and time, and the queries based on similarity. Spatial-temporal querying can provide up any data within a certain time and space period. And to the querying of multi-attribute data, the communication cost cannot be reduced only by enumerating every data value. Similarity search is one approach to solve such problem, and it is essentially providing a kind of range search. Therefore, this paper is mainly focused on design and implement an efficient protocol based on spatial-temporal and similarity search. We first using locality preserving hash function to map data to different zone of the network, and uses space and time mapping to decide where to store the data. We have the following considerations.1. In order to make queries efficient, locality sensitive hashing is necessary, and need proper mapping scheme to arrange data. Only few protocols have been proposed on this.2. Utilize constrained geographical knowledge to avoid the use of GPS. Therefore, we can reduce the energy cost and the price of sensors.3. Routing algorithm should avoid redundant paths.4. A head node is needed to maintain a specific zone, it is necessary to design the responsibility of such a node.In this paper, we consider a network containing a large amount of nodes. The shape of the network is rectangular. The network is divided into many small zones. Every zone contains a head node that is responsible for the communication within the zone. To each node, when it senses some data, it will use LSH hash to convert the data to a series IDs of zones, and send the data to those zones to store. In order that data happened in adjacent places are similar data, we utilize locality-sensitive hashing function to convert data. Inside each zone, data is further distributed according to space and time of happening. By this mapping scheme, a head node can quickly decide the responsible node for a specific data, so as to make query and routing convenient.In the experiment part, we implemented two recent proposed and popular methods Directed Diffusion and GHT, and compared with them. In order to show the performance of our scheme with GPS assistance, we also test our method with GPS. We tested our approach from communication cost, latency, hops, similarity search, discovery rate, success rate. The results show that our method has certain advantage over Directed Diffusion and GHT, and can achieve high discovers rate and accuracy in similarity search. |