| Vision is the most important sense of human beings.We capture information through vision,so as to achieve the purpose of observing and understanding things.Images are the carrier of visual information.Because images present information in the most intuitive way,they are used very frequently in the process of people’s communication.In order rto observe and control the loss of visual information,some computational processes are needed to qualitatively or quantitatively assess the image quality.Some existing models achieve the purpose of assessing image quality by using reference images and manually extracting image features,which is very unsuitable for the actual application scenarios of images.How to quantify the visual quality of distorted images through an end-to-end computational model without the corresponding reference images is of great research value,and it is also the most important research task in the field of image quality assessment.In recent years,deep learning has been widely used in the field of computer vision due to its powerful modeling ability for image content.Based on the design concept of deep learning,this paper proposes two end-to-end forms of no-reference image quality assessment models through multi-scale feature learning and combining global and local feature information.The research content and results of this paper are as follows:(1)Based on feature learning at multiple image patch scales,this paper proposes an end-to-end no-reference image quality assessment model,which is named MODEL1.MODEL1 implements image patch encoding at multiple scales through spatial pooling,uses the self-attention mechanism to learn feature information at multiple image patch scales,and maps the extracted features to corresponding scale quality scores through parallel regression layers.The visual quality scores of the image is mapped to the final prediction result of the image through a multi-scale quality score fusion module.In order to reduce the amount of computation,MODEL1 performs self-attention computing within shifted windows.Compared with multiple cutting-edge no-reference image quality assessment models,MODEL1 shows competitive performance strongly on three synthetically distorted image quality assessment databases(LIVE,CSIQ,and LIVE MD)and one authentically distorted image quality assessment database(LIVE Challenge),with high predictive accuracy and monotonicity.Additionally,it shows the highest generalization performance in cross-database validation.(2)To continue the research on multi-scale feature learning,based on the fusion of global and local features at multiple scales,this paper proposes another end-to-end noreference image quality assessment model,which is named MODEL2.MODEL2 uses ResNet50 to complete two scale encodings of the image,learns the global features of the image through the self-attention mechanism,learns the local features of the image through a shallow convolutional neural network,and then fuses the global features and local features of the image for image visual quality prediction at corresponding scales.In addition,image features at two scales are fused to generate mixed-scale features for image visual quality prediction at this scale,and finally the quality scores at three scales are mapped to the final visual quality score of the image through a multi-layer perceptron.Compared with multiple cutting-edge no-reference image quality assessment models,MODEL2 performs best on the four databases(LIVE,CSIQ,LIVE MD,and LIVE Challenge),and the model prediction results have the highest accuracy and monotonicity.Additionally,it shows the highest generalization performance in database validation. |