| With the ease of image capturing and the improvement of computational ability, the number of images increases dramatically in the past years, which induces a lot of near-duplicate images at the same time. Near-duplicate images refer to the images that are cap-tured from the same scene or object, but under different conditions. So they are varied in several aspects such as illumination, resolution or viewpoint. In addition, near-duplicate images can be obtained by editing the original images such as inserting text or cropping. Near-duplicate image matching is of great significance in copyright protection and mali-cious image attack detection. Furthermore, it is helpful in improving user experience in the context of image retrieval. Near-duplicate image matching is a very important issue in computer vision.Document image is characterized by a dominant part of text in the image. Near-duplicate document image matching plays an important role in many applications such as postal automation and digital library. In our work, we focus on two key issues involved in near-duplicate document image matching, viz. image representation and similarity mea-surement. An effective image representation can greatly facilitate the matching process. In general, different image representations entail different similarity measurements. We explore different ways to represent an image based on the characteristics of the document images. Our contributions are as follows:(1) A near-duplicate document image matching approach based on graph representa-tion of multi-granularity objects is proposed. We represent an image by a graph, with the nodes in the graph corresponding to the objects in the image and the edges modeling the relations among them. Consequently, the problem of image matching is transformed to graph matching. However, in order to cope with the instability of object segmentation, we propose a multi-granularity object segmentation method. Each image is then associated with multiple graphs. The objects from a graph may be of different granularities. For the two images to be matched, the two graphs with the maximum similarity are found from their respective graph representations. We thus regard the maximum similarity as the similarity between the two images. To compute the similarity between two graphs, the maximum weight clique in their association graph is employed. The experimental results demonstrate that the proposed method can cope with both handwritten and printed docu-ment images. In addition, good robustness to image resolution and illumination variations has been observed.(2) A near-duplicate document image matching approach based on pairwise Markov Random Field is proposed. To describe the parent-child relationships between the objects in an image, we represent the image by a tree and hence transform the image matching problem to tree matching. A pairwise Markov Random Field is defined on the tree, in which sense the nodes in the tree are considered as random variables and the edges en-code the relations among these variables in the probability domain. Then the tree match-ing problem can be solved by Maximum a Posterior (MAP) inference over the pairwise Markov Random Field. We can see from the experimental results that the proposed ap-proach is robust to the instability of object segmentation, thanks to the properly defined node and edge potential functions.(3) A near-duplicate document image matching approach based on variable-length signature is proposed. We first extract objects from an image using a clustering approach. A new visual descriptor, i.e. Probabilistic Center-symmetric Local Binary Pattern (PC-SLBP), is presented to describe the object appearance, which is an improvement of the commonly used Center-symmetric Local Binary Pattern (CSLBP). The experimental re-sults demonstrate substantial improvement in performance for PCSLBP compared with CSLBP. Beyond each individual object, the spatial relationships among the objects in the image are modeled. Taking into account both the object appearance and object spatial relationships, we represent an image by a variable-length signature, the length of which is varying with respect to the number of objects in the image. We utilize the Earth Mover’s Distance to compute the similarity between two images. It is good at handling variable-length signatures on one hand; on the other hand, it is able to cope with the issue of object segmentation instability naturally by allowing many-to-many object correspondence. The experimental results demonstrate that the proposed approach is robust to image rotation, illumination and resolution variations. It has achieved the best performance overall.(4) A near-duplicate document image matching approach based on local word spa-tial configurations is proposed, which mainly aims at machine-printed English document images. Each word in the image is encoded based on its shape characteristics. More importantly, we characterize local word spatial configurations. The bag-of-visual-words model is employed for image representation in our work. A lexicon is first learnt based on a training set. Then each word in an arbitrary document image can be represented by the words in the lexicon. For robustness concern, we adopt soft-assignment in our work, which means that each word is represented in terms of its K nearest neighbors in the lexicon. In order to achieve efficient retrieval, an inverted file index is constructed, transforming image retrieval as a voting problem. Each word in the query image votes on the images in the database, with the magnitude of the vote determined by the weight of the word and the consistency between the corresponding words with respect to local spatial configurations. The greater number of votes an image in the database receives, the more similar it is to the query image. From the experimental results, one can observe remarkable performance boost when the local word spatial configurations are considered. |