Font Size: a A A

Research On Model Performance Optimization For Visual Analysis Tasks

Posted on:2023-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:P Z RenFull Text:PDF
GTID:1528306845452014Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Computer vision analysis has a very important research value in the field of artificial intelligence.It mainly includes related research tasks in image and video fields,such as classification,detection,segmentation,and action identification.The research of these visual tasks has important value for many applications in the real world,such as face recognition,automatic driving,visual question and answer,behavioral analysis,etc.In the current machine learning model research,model performance has always been one of the main indicators for researchers to measure the quality of the model.Although machine learning algorithms have achieved very good performance in various visual analysis tasks,the following challenges still exist in the corresponding model design:(1)The robustness of machine learning model is poor,that is,the model is too sensitive to the noise contained in the data set,and the anti noise ability is weak.Small data noise often leads to model failure.(2)The generality of the deep learning model is insufficient,that is,in visual tasks,the model has poor adaptability to multiple tasks,and the model designed for upstream tasks(e.g.,image recognition and classification)often performs poorly in downstream tasks(e.g.,target detection,semantic segmentation and instance segmentation).(3)The degree of automatic design of deep learning models is low,that is,the automatic design of current models has a large amount of calculation and obvious application limitations,which makes it difficult to meet the needs of the surge in application scenarios(e.g.,the video field).In view of the above problems,the main research contents and contributions of this dissertation are as follows:(1)In view of the poor robustness of the model caused by the noise in the feature dimension and view dimension of the image data set,this dissertation takes spectral clustering algorithm as an example to study the robustness of traditional machine learning models.Specifically,the following solutions are proposed for different data attributes.1)Aiming at the problem of poor robustness of the model caused by noise and information redundancy in the feature dimension of image data,this dissertation proposes a Single view Spectral Clustering based on Intrinsic Subspace Learning(SSC ISL)to improve the robustness of the model.It first maps high dimensional data into a low dimensional subspace through a row sparse transformation matrix,and then constructs an affinity graph with intrinsic subspace.The use of row sparse transformation matrix makes our method can effectively suppress the influence of noise and outliers on model performance in feature dimension.We validate the high robustness of the proposed method on a noisy synthetic dataset,where the proposed method outperforms related clustering methods by 47%.Furthermore,our method achieves an average improvement in clustering accuracy of nearly 5% compared to state of the art methods on six publicly available datasets.2)Aiming at the problem that the consistency graph learning is susceptible to noise interference in the view due to the inconsistent view quality in the multi view data set,this dissertation proposes a Auto weighted Multi view Spectral Clustering based on View Quality(AMSC)algorithm to improve the robustness of the model.It measures the distance between each view and the consistency graph by using 1 norm to adaptively assign corresponding weights to different views,thereby suppressing the noise of view dimension.In robustness evaluation experiments,the proposed method achieves a performance gain of up to 498.55% on the evaluation metric NMI compared to state of the art methods.(2)In view of the universal challenges of the model in visual upstream and downstream tasks,this dissertation proposes a Dynamic Multi scale Window Visual Transformer(DMW Vi T)model.This method solves the problem of inconsistent information scale requirements of upstream and downstream tasks by designing a window self attention module with dynamic multi scale,so as to improve the compatibility and versatility of the model for upstream and downstream tasks.We verify the generality of our method on an image classification dataset with upstream tasks and two downstream tasks.Compared with Swin T,the method is increased by 0.7% on the upstream task(i.e.,image classification),and an average increase of 1.1% on downstream tasks(i.e.,image segmentation,target detection and instance segmentation).(3)Aiming at the difficulty and low efficiency of manual design of deep learning model in video tasks,this dissertation proposes a Neural Architecture Search Temporal Convolutional(NAS TC)method for complex action recognition.This method explicitly separates the 3D convolution on the time space dimension,and compresses the amount of computation by customizing a special search space on the temporal dimension for video tasks,so as to successfully introduce the neural architecture search method into the computation intensive long video tasks.We verify the effectiveness of our method on three public long video action recognition datasets,and the experimental results show that our method can achieve effective automatic model design in the video field.Compared with Timeception on three public data sets,NAS TC increases by nearly 2.2% on average.
Keywords/Search Tags:Machine Learning, Spectral Clustering, Multi-View Clustering, Visual Transformer, Neural Architecture Search
PDF Full Text Request
Related items