Font Size: a A A

Deep Neural Network Models Towards FMRI Visual Decoding In Human Visual Cortex During Natural Image Stimulation

Posted on:2021-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:K QiaoFull Text:PDF
GTID:1364330647457280Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Vision plays an irreplaceable key role in human survival,life,evolution and development.The study of human visual function has always been a hot topic in brain science research.Among them,exploring the information processing mechanism in the visual cortex for visual scenes,analyzing the representation characteristics of neural activity in the visual cortex for visual scenes,and decoding the visual scenes from neural activity in the visual cortex,are very important topics.These topics are of far-reaching significance and importance for understanding the brain's visual neural information processing mechanism,building robust and interpretable machine vision models,and promoting the development of artificial intelligence vision.Functional Magnetic Resonance Imaging(f MRI)provides a high-spatial-resolution,highreliability,and non-invasive technology to monitor neural activity in human visual cortex.It has become an important tool for research on human visual function.Natural images have complex scenes and their targets are much diverse.The decoding of natural images from f MRI visual information is a cutting-edge and difficult problem.The deep neural network(DNN)model is one of the computer vision models with the best performance and most similar to human hierarchical information processing in human visual cortex.At the same time,human vision has a strong inspiration for the study of DNN models towards visual computing.Therefore,this article uses DNN models to conduct the research on the decoding of f MRI visual information during natural image stimulation,and systematically explores the relationship and differences of the hierarchical processing of visual information between DNN models and human visual cortex,from the perspective of network structure,tasks,and representation characteristics,which provides some new understandings and perspectives for the intersection of deep learning and f MRI visual information decoding,and has important reference value and significance for the research on the mechanisms,methods and technologies related to the functional information processing of the human visual cortex.This article focuses on the scientific problem of "how to construct a computational model that accords with the characteristics of information representation in visual cortex through DNNs and accurately decode the scenes of natural image stimuli".Considering that the manner and ability of visual information representation based on DNN models are affected by many factors,from multiple perspectives such as deep learning training methods,visual tasks,network structure and representation characteristics,this article first builds visual encoding models with DNNs and their features to fully characterize and represent the information processing of low-level and high-level visual regions,and the models can realize accurate prediction of f MRI voxel responses in the visual cortex for natural image stimuli;then construct visual decoding models to realize the reconstruction from low-level feature content to scene content with high-level semantics,which continuously improves the accuracy of the f MRI visual information decoding during natural image stimulation.The main research results are as follows.1.Aiming at the low-level visual regions,an End-to-End Convolutional Regression Networkbased Visual Encoding Model(ETECRN-VEM)is proposed.How to construct an image representation model that accords with the information representation characteristics of the visual cortex is a key issue for visual encoding.The existing encoding models first uses a pre-trained DNN model for image recognition to extract image features,and then maps the image features to the f MRI voxel response of the visual cortex through a linear model in a voxel-by-voxel manner.However,this two-stage visual encoding method is difficult to effectively evaluate in advance which layer of image features has a good linear matching relationship with the voxel response of specific visual regions of interest(ROI).It is necessary to try the different depth of features in the network model to construct encoding models.Therefore,this two-stage manner has greater uncertainty in constructing an image representation model,resulting in the constructed encoding model is difficult to describe the information representation characteristics of a specific visual ROI.In addition,the encoding efficiency of the voxel-by-voxel manner is much low.In response to the two problems,this article introduces an end-to-end training method to drive a DNN to directly learn to construct an image representation model from f MRI data,which can better accords with the characteristics of specific visual ROI.Some selective optimization strategies of voxels,when encoding all voxels in a visual ROI at the same time,are designed to reduce the interference of invalid voxels with a lower signal-to-noise ratio(SNR)on the overall encoding,thereby constructing an end-to-end visual ROI encoding model.Experimental results show that the proposed model can better encode approximately 80% voxels of V1 and 60%-70% voxels of V2 and V3 visual regions.The encoding performance and efficiency in low-level visual regions have been significantly improved.2.Aiming at high-level visual regions,an Image Caption Features-based Visual Encoding Model(ICF-VEM)is proposed.How to construct a representation model for high-level image semantics is a key issue for encoding high-level visual regions.The scale of f MRI data is small,thus the end-to-end visual ROI encoding model is difficult to automatically learn the more complex and abstract information representation modes of high-level visual cortices.Existing encoding models mainly use DNNs pre-trained on the image classification task to extract image features.However,image classification tasks only require the identification of key objects in natural image scenes,and it is difficult to use DNN features driven by the image classification task to characterize the information representation characteristics of the high-level visual regions.In response to this problem,this article introduces a higher-level and semantic task: image caption,to drive the DNN to construct an image representation model that better accords with the information representation characteristics of the high-level visual regions,and extracts more complex and abstract semantic features to better encode the f MRI voxel response in high-level visual cortex.At the same time,with the help of the relevance between image caption features and a large number of semantic words,the semantic interpretation of voxels in high-level visual regions is realized.The experiment results show that the proposed model show advantages in about 60% voxels for almost all high-level visual areas and obtained higher encoding performance,and revealed that the highlevel visual regions represent the characteristics of the objects,objects' attributes and the relationship between objects in natural image scenes.3.For simple images,a Capsule Network-based Visual Reconstruction Model(Caps NetVRM)is proposed.Two-stage visual reconstruction based on DNN features is an effective way for simple image reconstruction.First,the f MRI voxel response is mapped to the intermediate network features,and then the predicted features are inversely mapped back to the image.The reconstruction accuracy is affected by the information integrity and reversibility of intermediate features.Therefore,how to construct a reversible image feature as a middle bridge that accords with the information representation characteristics of the visual cortex is a key issue for the accurate reconstruction of simple images.The Convolutional Neural Network(CNN)structure has the characteristics of invariant representations for translation and rotation.It is easy to lose some low-level feature information related to the object position and direction during the mage representation,which reduces the reconstruction accuracy on the low-level feature content of images.In response to this problem,from the perspective of the deep learning network structure,Caps Net is introduced to construct a complete and reversible image low-level feature bridge through equivariant representation.The corresponding capsule features are predicted through the f MRI voxel response,and the accurate reconstruction of the simple image is completed by the inverse transformation.The experimental results show that the proposed model improves the structural similarity index by about 10%,and significantly improves the reconstruction performance of the low-level feature content of simple images,and realizes the feature interpretation and analysis of f MRI voxels in low-level visual cortex by visualizing capsule features.4.Aiming at the low-level features of natural images,an Alternating Autoencoder-based Visual Reconstruction Model(AAE-VRM)is proposed.Visual encoding and visual reconstruction are two completely opposite problems.How to construct a feature space that accords with the information representation characteristics of the visual cortex is a common key problem for visual encoding and visual reconstruction.However,existing methods usually construct visual encoding and visual reconstruction models separately,which ignores the closely related nature of the two.Aiming at this problem,this article proposes to alternately and cyclically construct visual encoding and visual reconstruction models.With the aid of encoding models,a better reconstruction model is built,and a better reconstruction model can assist the construction of the encoding model.First,connect the visual encoding network and the visual reconstruction network in a different order to construct two opposite autoencoders,which are used for auto-encoding of images and f MRI voxel response,to assist the supervised training of the visual encoding and visual reconstruction.Then,based on semi-supervised learning,the visual encoding and visual reconstruction models are trained alternately.Through the mutual promotion and iterative enhancement of the two models,a visual reconstruction model that is more in line with the information characteristics of the visual cortex is constructed.Experimental results show that the proposed model is close to 90% in terms of the low-level feature reconstruction recognition,and has achieved higher accuracy in the reconstruction of the low-level feature content of natural images.5.Aiming at the semantics of natural image scenes,a Bidirectional Recurrent Neural Network-based Visual Classification Model(BRNN-VCM)is proposed.Different levels of visual regions are connected to each other under the bottom-up and top-down visual mechanisms to correlate and represent visual input information.The existing visual classification model regards f MRI voxels of all visual regions as a whole and sends them to the visual classifier.Without using the correlation between different visual areas,it is difficult to describe information representation characteristics of the bottom-up and top-down in the visual cortex.In response to this problem,this article adopts the BRNN model,regards the topologically connected visual regions in the visual cortex as a spatial sequence,and sends the f MRI voxel response in each specific visual ROI as a node in the spatial sequence into the BRNN to construct a decoding model that accords with the characteristics of bottom-up and top-down visual information flow in the visual cortex.By modelling f MRI sequence data,the feature information inside and between the visual regions is extracted,and the decoding of the semantics of natural image scene from f MRI visual information is completed.Experimental results show that the visual classification accuracy of the proposed model is about 5% higher than that of the existing models,which verifies the correlation between the two-way topology of the visual cortex and the semantic representation of the visual scenes.6.Aiming at the content of natural image scenes,a Generative Adversarial Network-based Bayesian Visual Reconstruction Model(GAN-BVRM)is proposed.Using GAN is an effective way to improve the naturalness of high-level features of reconstructed images,but it is often difficult to simultaneously take into account the fidelity of low-level features of reconstructed images.Simultaneously taking into account the low-level feature fidelity and high-level feature naturalness of the reconstructed image is difficult for the accurate reconstruction of natural image scene content.In response to this problem,based on the Bayesian method,this article first use BRNN-VCM to decode the semantic category of the image scene according to the f MRI voxel response,and sends it to the condition generator of pre-trained GAN to generate natural images based on the input random noise,and then feeds into ETE–CRNVEM to evaluate the fitness between the generated image and the f MRI voxel response in terms of the low-level feature space.The modules of GAN-BVRM are all composed of differentiable neural networks.Through gradient backpropagation,the generator's noise input vector is iteratively updated to maximize the f MRI voxel response of the visual cortex.Finally,the optimized noise vector feeds into the image generator to get the reconstructed image,thereby guaranteeing the fidelity of the low-level features and the naturalness of the high-level features of the natural images.Experimental results show that the proposed model improved average perceptual similarity metrics by about 10%,which significantly improves the reconstruction accuracy of the natural image scene content.
Keywords/Search Tags:functional Magnetic Resonance Imaging(f MRI), Visual Decoding, Deep Neural Network(DNN), Image Representation, End-to-End Training, Alternating Optimization, Capsule Network(Caps Net), Recurrent Neural Network(RNN), Generative Adversarial Network(GAN)
PDF Full Text Request
Related items