Font Size: a A A

Sparse Representation and Deep Learning for Image and Video Reconstruction

Posted on:2017-05-19Degree:Ph.DType:Thesis
University:Northwestern UniversityCandidate:Iliadis, MichaelFull Text:PDF
GTID:2448390005467231Subject:Computer Science
Abstract/Summary:
Techniques from sparse signal representation have attracted great attention in image and video reconstruction. This thesis starts with introducing the sparse representation model and then discusses two applications namely face identification and video compressive sensing.;First we describe an adaptation and extension to the sparse representation model for face identification. In such extension the sparsity-based approaches are effectively combined with additional least-squares steps that utilize more information, in order to achieve performance improvements with little additional cost.;Then, we consider an iterative method to address face identification with block occlusions. The approach utilizes a robust and sparse representation based on two characteristics in order to model contiguous errors (e.g., block occlusion) effectively. The first fits to the errors a distribution described by a tailored loss function. The second describes the error image as having a specific structure (resulting in low-rank). We will show that this joint characterization is effective for describing errors with spatial continuity. Our approach is computationally efficient due to the utilization of the Alternating Direction Method of Multipliers (ADMM). A special case of our fast iterative algorithm leads to the robust representation method which is normally used to handle non-contiguous errors (e.g., pixel corruption). Extensive results on representative face databases document the effectiveness of our method over existing robust representation methods with respect to both identification rates and computational time.;Another sparse representation problem we will investigate in this thesis is video compressive sensing. We introduce a novel video compressive sensing framework based on multiple measurement vectors which is suitable for signals with temporal correlation such as video sequences. Experimental results on two video sequences exhibiting fast motion and occlusions, show the advantages of the proposed method over the state-of-the-art in video compressive sensing.;This thesis presents also a deep learning approach for video compressive sensing. The proposed formulation enables recovery of video frames in a few seconds at significantly improved reconstruction quality compared to previous approaches. Our investigation starts by learning a linear mapping between video sequences and corresponding measured frames which turns out to provide promising results. We then extend the linear formulation to deep fully-connected networks and explore the performance gains using deeper architectures. Our analysis is always driven by the applicability of the proposed framework on existing compressive video architectures. Extensive simulations on several video sequences document the superiority of our approach both quantitatively and qualitatively.;Finally, our analysis offers insights into understanding how dataset sizes and number of layers affect reconstruction performance while raising a few points for future investigation. Finally, the deep learning framework for video compressive sensing is extended to a novel encoder-decoder neural network model called DeepBinaryMask. The proposed framework is an end-to-end model where the sensing matrix is trained along with the video reconstruction. The encoder learns the binary elements of the sensing matrix and the decoder is trained to reconstruct the video sequence. The reconstruction performance is found to improve when using the trained sensing mask from the network across a wide variety of compressive sensing reconstruction algorithms. Our analysis and discussion offers insights into understanding the characteristics of the trained mask designs that lead to the improved reconstruction quality.
Keywords/Search Tags:Video, Reconstruction, Representation, Sparse, Deep learning, Image, Trained
Related items