Font Size: a A A

Moving Target Detection In Satellite Videos Based On Wavelet Multi-Scale Learning Representation

Posted on:2024-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L PiFull Text:PDF
GTID:1522307340973729Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent years,with the development of remote-sensing satellites,it is no longer difficult to obtain satellite videos.Based on these videos,we can extract static and dynamic information on the ground simultaneously,which has great potential for numerous applications in both the civilian and military domains.It can be widely used in land and resources mapping,environmental monitoring,urban planning,driving behavior analysis,traffic density monitoring,traffic management,disaster assessment,military attack,and other major application fields.Compared with the videos shot by the ground-based camera,it can capture moving objects in large-scale scenes.However,due to the extremely great shooting distance of satellites,the size of relevant moving objects is very small in the complex ground background,such as vehicles,aircraft and ships.Most of the moving targets only occupy a few pixels,lacking useful shape or color features.At the same time,due to the top-down shooting angle,there will be a large number of pseudo-motion targets with similar appearances.It has become an urgent problem to identify the moving targets with small sizes quickly and efficiently.This dissertation focuses on the difficulty of moving object detection tasks in satellite videos and carries out a lot of research.Considering practical problems,a series of moving object detection methods in satellite video was proposed,especially for the motion of vehicles with extremely small sizes.The specific research content and contributions of this dissertation are as follows:1.Considering the lack of existing labeled remote sensing image or video datasets in the detection field,this dissertation proposes an efficient semi-automatic annotation system for remote sensing images.The intelligent sample annotation technology combines human and machine,which could greatly reduce the time and labor cost of annotation work,and provide data support for the development of deep learning models for detection tasks.The system can meet the requirements of rectangular boxes and pixel-level targets annotation.By introducing the training strategies of transfer learning and incremental learning,the labeling process can be started with only a small amount of data.Then the proposed networks can be cyclically optimized,and the labeling efficiency is gradually improved.When labeling,it designs the post-processing rules of detection results combined with expert knowledge to reduce the difficulty of the rectangular box correction.At the same time,it introduces an interactive unsupervised algorithm with reference points to improve the efficiency of pixellevel labeling.2.To improve the precision and recall of moving target detection in satellite videos,this dissertation introduces the target tracking technology and combines it with the detection process by a convolution network.In order to improve the recognition accuracy,we construct the network model based on the multi-scale attention mechanism and the information fusion strategy for feature maps of different semantic levels.It can conduct multi-scale detection and improve the expression ability of network features.At the same time,bidirectional tracking is performed between key frames,and a reasonable matching and discrimination strategy is adopted to improve the reliability of the tracking trajectory.Many missed targets with weak features are recalled in the tracking process.3.This dissertation proposes a practical end-to-end neural network framework to detect tiny moving vehicles in satellite videos,which are captured with low imaging quality,and contained many motion blurs and distractors of similar shape.It addresses these issues by integrating motion information from adjacent frames to facilitate the extraction of semantic features and incorporating the Transformer to refine the features for key points estimation and scale prediction,which can enhance the recognition ability of the model for fuzzy targets.Our proposed model can well identify the actual moving targets and suppress interference from stationary targets or background noise.The experiments and evaluations show the superior performance of our method compared to many state-of-art algorithms.4.To further optimize the performance of moving object detection in satellite videos,we propose a novel Discrete-Wavelet-Transform based neural network framework combined with the selective attention mechanism to directly and efficiently distinguish the moving vehicles,which can be trained in an end-to-end manner without any post-processing.To acquire the motion information,it adopts a multilevel differential module between adjacent frames and introduces the Discrete-Wavelet-Transform to decompose the multilevel semantic motion features.Subsequently,the candidate regions of moving vehicles can be obtained based on the frequency domain characteristics.Furthermore,we utilize selectiveTransformers to refine the selected features of moving targets based on the attention mechanism.Based on the prior information from wavelet transform,this filtering method can avoid most of the interference of background noise and improve the accuracy of detection.Experiments show that this method has outstanding ability in solving the problem of small moving target detection in satellite videos.5.To reduce the dependence on labeled samples and overcome the difficulties of moving detection in satellite videos,this dissertation proposes a novel unsupervised moving object detection algorithm based on discrete wavelet transform and consensus learning.Considering the missing features of selected targets,low contrast,and much interference in the complex background,the proposed method utilizes the motion information between frames as the temporal pseudo label and the frequency domain information obtained by wavelet decomposition as the spatial pseudo label.They are combined to realize the fine-tuning training of the network,so as to make up for the problem that effective semantic features cannot be extracted due to the lack of surface features.
Keywords/Search Tags:satellite video, moving object detection, low resolution, annotation system, selective Transformers, wavelet transform, unsupervised learning
PDF Full Text Request
Related items