| In recent years,unmanned aerial vehicles(UAVs),especially micro UAVs,have been more and more widely used in both military and civilian.The traditional technologies like the global positioning system(GPS)and inertial navigation system have been developed for decades,which are able to accomplish tasks such as pose estimation,route planning,and other related tasks.However,the task conditions and the task itself become more and more complicated.The vision-based scene perception becomes a prerequisite to safe and reliable UAV missions and attracts more attention of researchers.This thesis focuses on the key technologies of computer vision in UAV scene perception applications,such as scene 3D reconstruction,real-time depth estimation,and object detection technology.The visible light camera,depth camera,and event-based camera are used and novel algorithms are proposed to take advantage of the good properties of these sensors.The main contributions of this thesis are as follows:1.For the 3D scene reconstruction,this paper analyzes the connection and difference between the traditional Structure from Motion(SFM)and UAV 3D reconstruction and simultaneous localization and mapping(SLAM)technology.According to the actual needs of the 3D reconstruction of UAV images,and the traditional 3D reconstruction method of keyframe decimation having some drawback like(computational efficiency is low,the decimation criteria are heuristic bias,low robustness),we propose a parallel hierarchical keyframe decimation algorithm for UAV 3D reconstruction.Our algorithm parallelly considers the image acquisition,camera pose estimation,and sparse reconstruction.This algorithm was evaluated in outdoor real scene data and the experiments showed that,compared to the previous method of keyframe decimation algorithm of 3D reconstruction,our algorithm can select enough keyframes in real time,improve the accuracy and the robustness for actual scene of UAV aerial 3D reconstruction.2.For the real-time depth estimation,in addition to the traditional stereo camera,this paper uses novel Event-based cameras which mimic biological retina.This sensor asynchronously generates events in response to relative light intensity changes rather than absolute image intensity.So it has high time resolution,low redundancy,and can deal with complex light conditions.All these properties are suited for the UAV application scenarios for limited computing resources and high real-time demands.However,due to the data characteristics of event cameras,the traditional frame-based algorithms are not suitable for data processing of event streams.Therefore,it is necessary to design and implement algorithms for event-based cameras.A novel framework for binocular depth estimation based on event camera is proposed.The fiamework consists of four parts:event preprocessing,matching cost calculation,cost optimization and depth output.The cost optimization is the core part which considers mutual restraint between the events by belief propagation(Belief Propagation)optimization and semi-global Matching optimization method.The classical BP and SGM does not work for event accumulated frames,so we have to construct a modified factor graph to manage the event-driven input and formulate a dynamic updating mechanism to deal with the temporal correlation of the event stream the optimization method of the new definition of the cost function and cost optimization process for event camera.In order to evaluate the proposed algorithm,a binocular event camera system was built and datasets are recorded in different scenarios.Compared with several state-of-art event-based stereo matching methods on our datasets,the results demonstrate our method has higher estimation rate and estimation accuracy.Compared with the traditional stereo camera system,the results of our method have good depth consistency,even in low light and motion blur conditions.3.For the object detection and location,the traditional 2D color information and object edge information cannot solve the complex scene problem very well.The depth camera and depth recovery algorithm are becoming more and more mature these days,which makes the 3D depth information are easy to acquire.This paper uses the 3D information to overcome the shortage of objects detection using only 2D color information framework.With 3D depth information,three new local multimodal cues are constructed(Multimodal multi-scale saliency,MMS,Multimodal over segment straddling MOS,Ring density of depth,RDD).Then,this paper adds a global cue based on Gist descriptors to obtain the number and location of objects.Finally,object detection,location,number and scale estimation are formulated in a unified Bayesian model.To evaluate the performance of the proposed method,we compare it with the recent algorithms with some public common datasets like VOC 2007 and TUM RGBD dataset.Our method with less sampling candidate regions can still get higher precision and recall. |