Font Size: a A A

A Research On Multiple Context Based Scene Graph Generation

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ChenFull Text:PDF
GTID:2428330647451039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In Scene Graph Generation task,we try to understand the interactions of different objects as a whole,i.e.genertating a scene graph.A scene graph takes objects as graph vertexes and the relation between two objects as edges to generate a structured representation of an image,and it is made up of relationship triplets,such as?person,ride,horse?.A scene graph has objects information and a detailed description of an image.Thus,it includes much more semantic information than objects from object detection task,but has lower-level infromation than abstract description of an image extracted by image captioning task.Therefore,scene graph,as a mid-level semantic information,is often applied to other computer vision tasks,such as object detection,image captioning,image retrieval,text to image,image paragraph generation.Currently,using deep learning to generate scene graph is a common method.There are two sub-problems need to be solved: object detection problem and relation classification problem between two objects.Some existing works can only identify a few types of relationships,and some other works model the context between different relationship triplets,ignoring the association between predicate features of an object pair.In the third chapter of this paper,we propose a two-stage model: predicate feature association network,which utilizes multiple contexts.In the first stage,an common object detector is adopted to obtain object proposals,then we extract the object-level and scene-level contexts to improve the object classification performance.In the second stage,we first utilize multi-modal feature alignment to obtain the alignment context between image region and relation predicate.Then,alignment context and object-level context are combined and fed into a recurrent neural network for obtaining predicate feature association information.Finally,attention mechanism is utilized to get weighted sum ofpredicate feature association information for the predicate classification.Experiments are conducted on the public dataset–Visual Genome dataset,and recall is computed on the top K(K = 20,50,100)predicted relationships with the highest scores.The experimental results show that the proposed method improves the performance.On the basis of predicate feature association network,other two problems about Scene Graph Generation are studied.First,on the step of obtaining object-level context,we studied the fusion methods on multiple features which include visual features,category features and spatial features.Specifically,we use the difference computation based linear fusion technique and the improved Dense Multi-modal Fusion(DMF)which considers the fusion of multi-modal features and perform multi-level fusing.The second study is about the problem that the number of candidate object pair increase quadratically with the number of objects in an image.Thus,based on the idea of multi-level feature fusing,a relationship pair filtering network is proposed in this paper.Because of the effective selection of candidate object pairs,our model can utilize computational resources better in the test phase,and the useless object pairs are largely decreased.
Keywords/Search Tags:Scene Graph Generation, Context, Recurrent Neural Network, Feature Alignment, Feature Fusion, Object Pair Proposal
PDF Full Text Request
Related items