In recent years,with the development of modern neuroimaging and artificial intelligence,neuroscientists have been able to use these technologies to decode the brain,that is,to read the content of the brain.The neuroimaging technologies can be electroencephalography(EEG),magnetoencephalography(MEG),or functional magnetic resonance imaging(f MRI).Among these imaging technologies,f MRI has the advantages of non-invasiveness and high spatial resolution,and can locate brain regions,so it is often used in brain decoding research.Although the brain decoding technology based on f MRI signals has developed rapidly in the past decade,there are still some problems,such as low classification decoding accuracy,fuzzy reconstruction decoding and lack of research on language decoding.To solve these problems,this dissertation uses natural images as visual stimuli,uses f MRI technology to record the neural activities of the visual cortex,establishes a mapping model between visual stimuli and brain function signal,and conducts research on the theories and methods of visual perception decoding from three aspects of classification decoding,reconstruction decoding and language decoding.The main contents are as follows:(1)Classification decoding research was based on recursive neural network.In this research,to solve the problem of low classification accuracy,a classification decoding model based on short and long short-term memory network(LSTM)was proposed to classify the categories of stimulus images from the visual nerve activity.The results showed that the decoding accuracy of multi-time visual activities was higher than that of peak-time visual activity.This research proves that the LSTM-based decoding model can extract temporal information from multi-time visual activities,thus improving the performance of classification decoding.In addition,comparisons of different visual cortices showed that the classification accuracy of higher visual cortex was significantly higher than that of lower visual cortex.It is confirmed that the higher visual cortex contains more useful information for classification decoding than the lower visual cortex.(2)Reconstruction decoding research was based on progressive growth generative adversarial network.In this research,to solve the problem of blurred reconstructed images,a reconstruction framework including latent feature extractor,latent feature decoder and natural image generator was proposed to generate high-resolution natural image from visual activities.In the reconstruction framework,a latent feature extractor is used to extract latent features of images.The latent feature decoder is used to predict the latent feature of images from the neural activity of the higher visual cortex.The natural image generator is used to combine the predicted latent feature of the image with neural activity of lower visual cortex to gradually generate high-resolution reconstructed images starting from those with low resolution.The results showed that the reconstructed images were similar to the stimulus images.It can be concluded that this reconstruction model can not only generate high-resolution images from visual activities,but also keep semantic information consistent with the stimulus images.(3)Reconstruction decoding research was based on similar condition generative adversarial network.To solve the problem of insufficient sample size,this research proposed an end-to-end reconstruction decoding model to generate natural images from visual activity.The reconstruction decoding model included an image feature extractor,a brain feature extractor and a condition generative adversarial network.Firstly,the image feature extractor and brain feature extractor extracted latent features of natural images and visual activities respectively.Then,the reconstruction decoding model constructed similarity loss by judging whether there was a match between the natural image and the visual activity,and introduced it into the total loss of the model.Next,latent features of visual activities were input into the condition generative adversarial network as conditions to generate the reconstructed image.Finally,a matching-or-not strategy was used to train the reconstruction decoding model.The results showed that there was a high similarity between the reconstructed image and the stimulus image.Therefore,the research shows that the similarity loss and the matching-or-not training strategy can enable the reconstruction decoding model to learn latent features that carry effective information,thereby improving the performance of reconstruction decoding.(4)Language decoding research was based on recursive neural network.In this research,to solve the problem of lack of language decoding research,a language decoding model based on LSTM was proposed to generate language from visual activities.The language decoding model included a image encoder,a f MRI encoder and a language decoder.First,the image encoder and f MRI encoder extracted the latent features of natural images and visual activities respectively.The latent feature of the image and the latent feature of the visual activity were then weighted by transfer factors and the weighted terms were fed into the language decoder to generate phrases or sentences describing the natural image.In the process of training,the weight of latent feature of images was gradually transferred to latent features of visual activities.In the test session,only latent features of visual activities were used to generate phrases or sentences.The results showed that the higher visual cortex achieved higher language decoding performance than the lower visual cortex.Therefore,this research shows that the effective information used to generate language lies mainly in higher visual cortex.(5)Language decoding research was based on Transformer.To solve the problem of low accuracy of language decoding,this research proposed a dual-channel language decoding model based on Transformer,which once again generated language from visual activities.The language decoding model consisted of an image extractor,an image encoder,a neural extractor,a neural encoder and a language decoder.First,the image extractor and the image encoder worked together to extract the latent feature of the natural image.At the same time,neural extractor and neural encoder worked together to extract latent feature of visual activity.The similarity between the two latent features was then calculated to obtain the similarity loss and added to the total loss of the language decoding model.Finally,the latent feature of the natural image and the latent feature of the visual activity were weighted by transfer factors,and the weighted terms were input into the language decoder to generate phrases or sentences.As in the previous study of language decoding,the weight of latent feature of images was gradually transferred to latent features of visual activities during the training process.In the test session,only latent features of visual activity were used to generate phrases or sentences.Through the comparisons of different training strategies,the results showed that the progressive transfer training strategy had the highest decoding performance.At the same time,the comparisons of whether similarity loss was utilized or not showed that similarity loss improved the accuracy of language decoding.Therefore,this research shows that the Transformer-based dual-channel language decoding model is an excellent model that can achieve higher language decoding performance. |