| Infrared imaging technology is mostly used in Infrared Search and track(IRST)system because of its advantages of long-distance imaging,good concealment and strong anti-interference ability,while infrared small and dim target detection is the key technology.Limited by the characteristics of infrared long-distance imaging,the target appears in the field of view as a bright spot with weak local intensity,small pixel ratio,lack of apparent spatial features,while coupled with the influence of background clutter brought by various scenes,it is a very challenging task to design a detection algorithm with high precision,strong robustness and high real-time performance.To solve this problem,this thesis guided by the theory of spatial-temporal detection,carries out the research along from the spatial-temporal feature model to attention-guided local spatial-temporal semantic feature fusion network,which transforms the task of small target detection into the task of pixel probability estimation.The main work is as follows:(1)The basic theories of the spatial-temporal detection of infrared small and dim targets are studied.It includes infrared sequence image characteristic analysis,the clarification of the concept of spatial-temporal semantics,the classic spatial-temporal detection models of small and dim targets,and the spatial-temporal semantic fusion methods and other relevant theories,which provide a theoretical basis for the subsequent algorithm design of this thesis.(2)An infrared sequence image dataset of small and dim targets is built.In view of the incompleteness of the available infrared sequence image data,real and semi-synthesized multi-scene infrared sequence image data are captured and integrated.Then,the coordinates of target centroid annotation,the pixel-level label calibration and the division of train set and test set are completed.The built sequence dataset provides data support for algorithm implementation and effect verification.(3)Small and dim target saliency enhancement based on spatial-temporal feature fusion model is studied.Firstly,the target region proposals are extracted through the space domain grayscale enhancement model.Then,the spatial-temporal features of the targets are simultaneously extracted by constructing the space and time domain joint dynamic local sliding window,and the local variance difference measurement is developed from the space domain to the three-dimensional spatial-temporal domain.The model-driven method accurately estimates and improves the real target.Through qualitative and quantitative analysis,compared to the single frame and multi-frame detection models,this algorithm has higher detection accuracy and faster processing speed.(4)Small and dim target detection network based on local spatial-temporal feature fusion is studied.On the basis of the physical model,combined with the feature learning ability of the neural network,the small and dim target detection task is transformed into a classification task based on pixel probability estimation.Firstly,a lightweight shallow global feature extractor is designed for small and dim targets.Then the local feature blocks are obtained by using the spatial-temporal physical model and the local mask segmentation mapping strategy.Finally,by designing the spatial-temporal attention fusion module,the spatial-temporal semantics are extracted deeper in the local feature blocks,which achieves the accurate estimation of the real targets.This network realizes the coarse-to-fine feature extraction from the global shallow feature to the advanced spatial-temporal semantics.Experiments have proved that the designed network has better detection performance than model-driven detection algorithms and data-driven algorithms,which can accurately distinguish real targets and false alarms. |