Font Size: a A A

Learning Structured Probabilistic Model For Heterogeneous Data Compression

Posted on:2015-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:W R DaiFull Text:PDF
GTID:1228330452966575Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of data collection, network service and storage techniques,massive volume of heterogeneous data with complex structures are generated. Learn-ingstructuredprobabilisticmodeliscomposedofutilizingprobabilisticgraphicalmod-el to represent the complex structures in heterogeneous data, making model-basedinference for learning, and adopting reasoning algorithms to optimize learning pro-cess. By learning and optimizing, structured prediction model utilizes the complexstructure to make prediction for sets of prediction tasks simultaneously. Such methodgenerates less information than the combination of individual predictions, which issuitable for heterogeneous data compression. Hence, generalized context modeling(GCM) is established to capture complex structures in heterogeneous data. On suchbasis, structured probabilistic model and its learning are applied into heterogeneousdata compression applications, e.g. genomic data compression, lossless image codingand intra-frame video coding.Firstly,thisdissertationproposesgeneralizedcontextmodeling,whichextendsthecontexts from the sufx of predicted subsequences in classical context modeling to thearbitrary combination of symbols in all the directions with combinatorial structuringand multi-directional extension. The model tree for GCM is constructed to address theselection of contexts, which is composed of a combinatorial structuring of fnite ordercombination of predicted symbols in multiple directions. For optimal prediction, nor-malized maximum likelihood (NML) function is developed to estimate the structuresand parameters of GCM. Moreover, GCM class is refned by context pruning to obtainthe optimal class of models in terms of minimum description length (MDL) principle.Consequently, the estimated probability for prediction is derived based on selected models. The upper bound of model redundancy for GCM is proven to be irrelevant ofthe size of heterogeneous data.In nature, GCM is structured probabilistic model, as it considers interdependen-cies among symbols in prediction. When applied into non-ASCII fles in Calgary cor-pus and executable fles, maximum likelihood for GCM improves the compression per-formance. Therefore, it is promising to learn structured probabilistic model in hetero-geneous data compression for better performance. For validation, we applied learningstructured probabilistic model in heterogeneous data, e.g. genome, image and video.This dissertation proposes the learning of structured probabilistic model for ge-nomic data compression. The proposed scheme optimizes the diferences of target andreference sequences with hierarchical prediction structure for wavelet coding. The se-lected references are obtained by minimizing the estimated coding cost for fragmentsof nucleotides specifed by various confgurations of side information, e.g. their sizesand matching ofsets. Specifcally, the distribution of the diference sequences is con-centrated to zero, which is desirable for the subsequent wavelet coding. Furthermore,Markov chains are generated for each fragment to represent the interdependencies ofside information among its adjacent sub-fragments. A belief propagation (BP) proce-dure is adopted to estimate and update the conditional distribution for the side informa-tion of each fragment. In summary, the proposed scheme is efcient for compression,as it balances the accuracy of prediction and the overhead of specifying the references.Experimental results show that the proposed scheme outperforms all existing bench-marks in a noticeable gap.This dissertation proposes the learning of structured probabilistic model for loss-less image coding. The proposed scheme exploits the spatial statistical correlations forthe optimal prediction directly based on2-D contexts, and formulates the data-drivenstructural interdependencies to make the prediction error coherent with the underly-ing probability distribution for coding. Under the joint constraints for local coherence,max-margin Markov networks are incorporated to combine support vector machinesstructurally to make max-margin estimation for a correlated region. Specifcally, itaims to produce multiple predictions in the blocks with the model parameters learnedin such a way that the distinction between the actual pixel and all possible estimations is maximized. It is proved that, with the growth of sample size, the prediction erroris asymptotically upper bounded by the training error under the decomposable lossfunction. The proposed scheme outperforms most benchmark predictor reported.This dissertation proposes the learning of structured probabilistic model for intra-frame video coding. The proposed scheme is integrated with High Efciency VideoCoding (HEVC) intra coding to serve as an alternative mode for rate-distortion opti-mization, which simultaneously predicts blocks of pixels with optimal rate-distortion.The proposed scheme incorporates max-margin Markov network (M3N) to regulateand optimize multiple block predictions. Specifcally, the proposed schemes opti-mizes a set of predictions by associating the log-Laplacian loss function to the jointdistribution of succeeding DCT coefcients. In solution and prediction, expectationpropagation utilizes function family to approximate the actual distribution of the resid-ual. Meanwhile, since the convergence conditions of BP is unknown, the underlyingMarkov network structure is optimized to fnd states that achieve global optimizationby using expectation propagation (EP). With the growth of sample size, the averageprediction error is asymptotically upper bounded by the training error under the de-composablelossfunction. Theproposedschemeobtainsbitratereductionandachievesbetter visual quality in comparison to the HEVC intra coding.
Keywords/Search Tags:Structuredprobabilisticmodel, reasoningalgorithm, heterogeneous data compression, context model
PDF Full Text Request
Related items