| Image feature extracting and matching is a basic research problem, also difficult problem in the computer vision field. It has been a research hotspot and many domestic and overseas experts and scholars present a lot of methods and theories about image feature abstracting and matching one by one in the past few decades, mainly including many methods represented by SIFT and SURF recently and so on. Though all kinds of their variants are better than themselves in speed, experiments show that accuracy of variants is lower. GPU optimization technology to image processing becomes a razor to solve the problem and GPU architecture is specially designed for application, in which existing a large amount of computation. In2006, NVIDIA present Computing Unified Device Architecture (CUDA) based on GPU, which reduces difficulties using GPU to solve image processing problem in a large part.In this paper, main work is optimization research of SIFT and SURF algorithm based on GPU. The research platform is GTX550Ti.This paper focuses on the problem of how to fully exploit the inherent parallelism of algorithm and how to utilize the advantages of CUDA architecture in parallel computing. This Paper will optimize them from six aspects including program’s parallel degreeã€proportion of per-pixel float-point computing workload to memory access workloadã€per-pixel float-point computing workloadã€per-pixel memory access workloadã€branch diversity and task dependency and analyze CUDA architecture from two aspects containing hardware and software. Designing reasonably structure of thread grid and thread block according to hardware parameters can do good to reasonably schedule and fully utilize GPU hardware resource. Analyzing software from thread model and memory model two sides, we can find that CUDA provide as many as seven kinds of memory variables and each kind of memory variable has special use. Fully understanding their functionalities and reasonably utilizing limited memory resource often can achieve unexpected acceleration effect. Reasonably distributing resources on GPU and CPU can improve algorithm’s performance by a large margin and simplify the parallel logic of algorithm, which make the algorithm well apply to high realtime application fieldExperiment results show that SIFT and SURF algorithm optimization based on CUDA proposed in this paper get ranging several times to tens times speed-up effect. |