Font Size: a A A

Research On Feature Fusion Methods For Scene Recognition And Understanding

Posted on:2023-11-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y ZouFull Text:PDF
GTID:1528306845996779Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Scene recognition and understanding provides basic object and location context information of man-made or natural scene for many intelligent applications such as smart city,autonomous vehicles,mobile robot,can help those intelligent systems make appropriate and reasonable decisions.Scene understanding can be further di-vided into sub-tasks,including scene classification,scene attribute recognition,scene parsing.The increasing difficulty of the task puts forward higher requirements for scene feature representation ability and scene model classification and discrimina-tion ability.On one hand,the complex structure of scene image and background noise make the problem of intra-class inconsistencies particularly prominent.On the other hand,the scale of scene categories,attributes and objects is increasing,and the recognition and parsing performance of existing methods is still insufficient.Thus,the more general task of scene understanding remains a huge challenge.The feature fusion method can remove redundant and irrelevant information and real-ize the complementation of multiple features,making it possible to design a scene understanding model with higher performance and more robust.Based on feature fusion theory,and combined with deep learning model,graph learning model and other mathematical models,this paper studies and proposes classification,recognition or parsing algorithms for the problems and challenges of three sub-tasks in scene understanding.The contributions mainly include the following four aspects:(1)In order to solve the problem of intra-class inconsistencies in complex scene classification,an adaptive nonnegative feature fusion(Ada NFF)method is proposed to improve the performance of complex scene image classification.The Ada NFF in-tegrates nonnegative matrix factorization,adaptive feature fusion and feature fusion boosting into an end-to-end process to achieve feature learning and classification of scene images.Firstly,for the nonnegative feature of scene image,an adaptive feature fusion method based on nonnegative matrix factorization is established to deal with the problem of intra-class inconsistencies.Secondly,a feature fusion boosting algo-rithm is proposed based on single-feature or multi-feature fusion results to further improve image feature representation ability.Finally,the normalized l2-norm clas-sifier and multi-layer perceptron classifier are trained to predict the labels of scene images.All classifiers are validated against the scene classification benchmark.Ex-perimental results suggest that Ada NFF method can effectively deal with complex scene classification problems with large intra-class inconsistencies,and achieve good classification performance.(2)To solve the problem of large-scale scene classification,a Neuro-Max Ent fusion architecture search(NMFAS)method is proposed,which effectively reduces the classification error rate of large-scale scene.Based on the deep convolutional neural network model,NMFAS improves the existing neural architecture search technology by extending the neural network feature fusion operation to realize the search of the optimal scene classification model architecture.Firstly,the search space is expanded based on the differentiable architecture search,and more feature fusion operations are introduced,including maximum,convolution,multiplication,3D-pooling and so on.Then,all feature fusion operations are generalized to a more general form to receive more input streams and improve architectural compatibility.Finally,to overcome the complexity of search space expansion,a regularization term based on maximum entropy is proposed to reduce search cost and avoid overfitting of architecture parameters.Experimental results suggest that,compared with other advanced methods,NMFAS can improve model performance through feature fusion operation,and reduce the cost of architecture search by speeding up the search phase.(3)Aiming at the modeling of scene attribute representation,a feature fusion method based on mini-batch minimum simplex estimation(MMSE)was proposed to enhance the ability of scene attribute representation and improve the perfor-mance of attribute recognition.MMSE builds a simplex representation model of scene attributes by introducing a linear mixing model,then converts the scene fea-ture learning problem into a minimum simplex estimation problem and solves it,so as to implement multi-attribute recognition of scene images.Firstly,scene image modeling is carried out based on linear mixing model,and a mini-batch minimum simplex estimation algorithm is proposed to learn attribute-based scene representa-tion from complex scene image data.Then,a two-stage multi-feature fusion method is proposed to further improve the feature representation of scene attributes.Fi-nally,the advantages of fast convergence and nonnegative feature preserving for the nonnegative matrix decomposition algorithm are used to improve the computa-tional speed on large-scale scene datasets.Experimental results of scene attribute recognition suggest that the performance of MMSE is better than other advanced scene attribute recognition methods.(4)A self-supervised feature fusion-based graph convolution network(SFGCN)model is proposed to improve the scene parsing accuracy of multi-scale objects in scene images.Based on graph convolution network,SFGCN designed and added k-neighbor-based spatial graph convolution,spectral graph convolution fusion-based self-supervised attention,and domain adaptive scene graph pooling modules,to achieve pixel-level semantic label parsing of scene images.Firstly,the model adopts hierarchical grid to construct scene graph feature data,and then constructs the net-work model based on k-neighbor spatial graph convolution operation to implement global feature learning.Then,to solve the problem that local semantic labels are difficult to parse,a graph attention module based on self-supervised feature fusion is proposed.By combining spectral graph convolution and attention mechanism,self-supervised information is added to local model weight training to enhance lo-cal feature learning.Finally,in order to maintain the in-domain consistency of multi-domain scene images,a multi-domain adaptive scene graph pooling method is proposed to solve the in-class inconsistency problem of scene images caused by their own differences.Experimental results on several public datasets suggest that the proposed method can effectively achieve semantic parsing of multi-scale objects and is superior to other advanced methods.
Keywords/Search Tags:Scene Understanding, Scene Classification, Scene Attribute Recognition, Scene Parsing, Feature Fusion
PDF Full Text Request
Related items