Font Size: a A A

Research On Statistical Learning And Statistical Control Process For Matrix Data

Posted on:2021-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F YeFull Text:PDF
GTID:1480306503482604Subject:Statistics
Abstract/Summary:PDF Full Text Request
The supervised learning tasks are often encountered in the area of machine learning,pattern recognition,image processing and data mining.On the other hand,statistical process control(SPC)is studied to detect change point and raise an alarm,which has been used in a variety of applications,including economics and finance,metabolomics,network analysis,data quality,medical diagnosis,environmental measurement and so on.It is advisable to combine SPC with statistical learning.However,real-world data such as digital images,MRI scans and elec-troencephalography signals are naturally represented as matrices.Thus,the study of matrix data not only has great significance but also plays an important role in the realistic society.The learning algorithms,learning theories and optimal control charts are considered in this paper in the light of mathematical analysis,probability theory and stochastic analysis.The main contributions are listed as follows:In the first chapter,the research background and significance as well as the domestic and foreign research status of the supervise matrix learning and change point detection are introduced.Moreover,the research problems and main contents of this paper are briefly stated.In the second chapter,we first define the matrix Hilbert space,which extends the scalar inner product to the matrix inner product.To address the issue of classification problem,we construct a matrix-based hyperplane and propose the kernel support matrix machine.The hyperplane is determined by two elements:the regression matrix and the weight matrix.Further,matrix kernel functions are applied to detect the nonlinear relationships.Theoretical analysis for the generalization bounds is derived based on Rademacher complexity with respect to a probability distribution.We demonstrate the merits of the proposed method by exhaustive experiments on both simulation study and a number of real-word datasets.The third chapter is devoted to introducing the concept of multi-distance and proposing the multi-distance support matrix machine.More specifically,the multi-distance is defined as an array which contains the products of columns and rows of the sample and the regression matrix.It contains much more information than the conventional scalar distance.Meanwhile,we compare the time complexity of the proposed method with other advanced algorithms.We further present the theoretical analysis of generalization bounds for i.i.d.processes and non i.i.d.processes based on different classifiers,in terms of Rademacher complexity and Vapnik-Chervonenkis dimension(VC dimension).Finally,we evaluate the performance the proposed method by conducting numerical and real experiments.The forth chapter dedicates itself to discussing the change point detection problem,we present the probability sum measure for finite matrix sequences.We focus on calculating the detection delay rate which is the probability of the stopping time T being greater than the change point time?.Then,we prove that the control chart with the charting statistics of the log likelihood ratios with dynamic control limits is optimal for detecting the change in distribution of the finite number of matrix observations.The numerical simulations validate the optimality of the proposed control chart.In the last chapter,we present the support matrix clustering.By using its distance function,we construct the control chart TSMC with dynamic control limits.Then,we prove the optimality of TSMC under the measure m*(T).We further discuss the limit state of TSMC,when the sample size N goes to infinity,it is equivalent to a control chart with constant control limits.To solve the problem that the location and the value of the change point is unknown,we propose a performance index B(T)followed by the chart T*SMC which is inspired by the concept of multi-chart.Moreover,we present the estimation of B(T*SMC)when the number of observations tends to be infinity.Numerical experiments are conducted to illustrate the detection performance of the proposed control charts.
Keywords/Search Tags:kernel support matrix machine, supervised learning, reproducing kernel matrix Hilbert space, matrix kernel function, multi-distance support matrix machine, generalization bound, Rademacher complexity, Vapnik-Chervonenkis dimension, optimal control chart
PDF Full Text Request
Related items