Font Size: a A A

Research On Human-Object Interaction Detection By Fusing Multi-Scale Features

Posted on:2024-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:M P YuFull Text:PDF
GTID:2568307079970739Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Human-object interaction detection is one of the important tasks in the field of computer vision,aiming to identify human-object interaction triads in images.Current methods for human-object interaction detection mainly use Transformer structure-based methods,which encode images as sequences to be input into an encoder-decoder structure and use query vectors to predict human-object interaction pairs.In this thesis,we will improve the algorithm QPIC based on it to address the current outstanding problems in the field of human-object interaction detection,and the main contents and contributions of this thesis are as follows:1.An improved multi-scale deformable human-object interaction relationship detection algorithm based on the QPIC algorithm is proposed to address the problems of lack of multi-scale features and high computational complexity of the current Transformer structure-based algorithms.The backbone network is replaced with Swin Transformer to enhance the feature extraction ability,and the feature maps of the three stages after Swin Transformer are extracted to get multi-scale features.In order to solve the computational complexity problem caused by multi-scale features,this thesis introduces a deformable attention mechanism to screen feature points around the reference point,and also designs a dual-stream human-object entity attention to improve the deformable attention and optimize the reference point selection strategy.Finally,experiments were conducted on the HICO-DET dataset,and the training efficiency was significantly improved,the number of epochs was shortened by 66%,and the m AP reached 28.96% while the training efficiency was improved,in which the detection of interaction pairs of human and objects with smaller scales achieved increases of 1.23% and 0.89%,respectively,and the detection of close interaction pairs achieved 0.78% increase for the detection of close interaction pairs.2.For the problem of imprecise reference point selection and loss of contextual semantic information caused by deformable attention mechanism,a human-object interaction relationship detection algorithm incorporating spatial contextual semantic features is proposed.Inspired by the two-stage approach,this thesis introduces a priori spatial,contextual semantic features to provide additional information to the query vector as the basis for the initial reference point selection.Meanwhile,a contextual attention mechanism is added in the decoder to introduce contextual feature information to make up for the lack of contextual information caused by the deformable attention mechanism.Finally,quantitative experiments are conducted on the HICO-DET dataset,and the increases of1.69%,1.76%,and 0.79% are achieved under the metrics of Full,Rare,and Non Rare,respectively.Among them,2.67% and 2.28% increases were achieved for the detection of interaction pairs of human body and objects with smaller scales,and 1.79% increase was achieved for the detection of distant interaction pairs.3.A monitoring system is implemented using the improved human-object interaction relationship detection algorithm in this thesis.The system includes a data management module,a back-end image processing module and a front-end display module.The data management module stores and manages the surveillance video data and person interaction detection results,and the back-end image processing module detects the person interaction relationship for the surveillance video or images.The front-end display module uses Web pages to display the person interaction detection results in real-time on the surveillance screen.Finally,the system is tested to prove the application value of the person interaction monitoring system.
Keywords/Search Tags:human-object interaction detection, multi-scale features, deformable attention mechanism, spatial contextual semantics
PDF Full Text Request
Related items