| Scene Graph Generation(SGG)aims to provide a graphical representation of objects and their relationships in an image.Recently,SGG has emerged as a promising approach that bridges the gap between vision and natural language domains.However,due to the complexity of relationship characterization and the imbalanced nature of the training data,SGG has emerged as a challenging task in computer vision.This dissertation develops several models that target the graph property and inductive bias of SGG.The details of this dissertation are as follows:(1)There are three properties of the scene graph that have been underexplored in recent works: namely,the edge direction information,the difference in priority between nodes,and the long-tailed distribution of relationships.Accordingly,we propose a graph property sensing net that fully explores these three properties for SGG.First,we propose a direction-aware message passing module that augments the node feature with node-specific contextual information and encodes the edge direction information via a tri-linear model.Second,we count the number of triplets associated with each node to represent its priority and apply the node priority as a weight factor on the node priority sensitive loss.Third,we mitigate the long-tailed distribution problem by softening the frequency of relationships and enabling it to be adjusted for each subject-object pair according to their visual appearance.(2)Existing SGG methods only assume scene graph homophily while ignoring heterophily.Accordingly,we propose a heterophily learning network to explore the scene graph’s heterophily property comprehensively.First,we propose an adaptive reweighting message passing module,which adaptively integrates the information from different layers to exploit both the heterophily and homophily in objects.Second,we introduce a relationship feature propagation module that efficiently explores the heterophilic connections between relationships via the high-pass graph filter to refine the relationship representation.Third,symbolic information is introduced into the heterophily-aware message passing module,facilitating improved messaging passing module’s performance in handling complex scenes.(3)Existing SGG methods mainly suffer from the ambiguous object representation.Accordingly,we propose a regularized unrolling network(RU-Net)to address these two issues.We first study the relation between GMP and graph Laplacian denoising(GLD)from the perspective of the unrolling technique,determining that GMP can be formulated as a solver for GLD.Based on this observation,we propose an unrolled message passing module and introduce an-based graph regularization to suppress spurious connections between nodes.(4)We propose a layout-specific knowledge reasoning network to alleviate the label imbalance problem in SGG.Compared with existing methods,the most impressive innovation of this method is that it utilizes the layout-specific prior to regularize the generated scene graph.First,we develop a message passing module that utilizes the layout of each pair of objects to aid in the prediction of their correlation coefficients.Second,we propose an object concurrencesensitive loss to facilitate object correlations’ prediction.This loss incorporates layout-specific information about object concurrence into the predicted correlation coefficients.Third,we design layout-specific relationship bias and employ it as prior knowledge to enhance relationship prediction.Extensive evaluation of three popular datasets,i.e.,Visual Genome,Open Images,and Visual Relationship Detection,demonstrates the effectiveness and superiority of the proposed models. |