| In recent years,computer vision tasks have gained more and more attention and made great progress.In order to further understand the visual world,computers not only need to complete the task of visual perception such as object detection,but also need to perform more complex analysis of the scene to complete visual understanding tasks.This thesis focuses on human-object interaction detection in image,which aims to identify the interaction between human and object.Due to the complexity of human behavior,a person may interact with multiple objects of the same or different types at the same time,which makes the task of human-object interaction detection more difficult and complex.Human-object interaction detection framework usually uses the parallel multi-stream structure.The appearance of human and object and the spatial pattern between human and object are used to judge the interaction relationship.Most of these methods are based on instance level appearance and bounding box,lack of interaction details and context information,so it is difficult to achieve good detection results.Therefore,this thesis proposes a human-object interaction detection method based on human pose,which uses the detail information of human body parts to obtain effective context and improve the detection accuracy.Firstly,in view of the lack of details of the space information at the instance level used in the existing methods,this thesis proposes a human-object interaction detection method based on the human pose information,which introduces the skeleton point information of each joint in human posture.This method uses absolute pose and relative pose to express the spatial information between the body part and human and object centers.The relative pose represents the relative spatial position relationship between parts of human body and objects.The absolute pose represents the human’s own action posture in the interaction.This part level spatial information of the human and the object is used to supplement the missing details of the original instance level spatial information.And the word embedding vector is used to further optimize the interactive expression.In addition,this thesis also uses the relevance of multiple human-object pairs in the scene to help judge the interaction relationship.Secondly,in order to effectively use the scene information,this thesis designs a human-object interaction detection method based on the pose-based scene information.From the perspective of interactors,this method makes better use of the scene information related to interaction,thus helping to improve the accuracy of recognition.The pose-based scene module uses the human posture to guide the scene to obtain the areas that need to be focused on,which enhances the feature of interaction area and avoids the interference of irrelevant background in the whole picture.This method provides effective context information for the use of human appearance for interaction detection.In addition,to solve the problem of sample imbalance in training,this thesis uses Focal Loss function to guide network training,which enables the network to better distinguish difficult samples and improves the detection performance of the network.To sum up,this thesis mainly studies the method of human-object interaction detection method based on human pose.The part level human post information is fused with the instance level spatial information,word embedding,human and object appearance feature and the full image scene information.This thesis designs the corresponding interaction detection network to verify that this method improves the network feature learning ability.The effectiveness of this method is verified by experiments.The human pose provides more abundant information for the interaction detection and improves the accuracy of human-object interaction detection. |