Font Size: a A A

A Time-change Method For Factor Inference And Covariance Estimation Of High Frequency High Dimensional Data

Posted on:2023-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y PengFull Text:PDF
GTID:1520307028970789Subject:Financial statistics and risk management
Abstract/Summary:PDF Full Text Request
In this paper,under a class of high-dimensional diffusion models with factor structure,using time change sampling based on the observable market factor,a principal component analysis and covariance matrix estimation method for high-frequency and high-dimensional financial data is proposed.The relevant limit theories are proved.With the development of information technology,financial data has grown at an explosive rate and we enter the "big data era".Big data in financial markets can be divided into high-frequency data and high-dimensional data.High-frequency data refers to data with a short sampling interval,which tends to be non-stationary and have leverage effect.High-dimensional data refers to data with comparable dimensions and sample sizes,often with a factor structure.That is,most of the volatility in a basket of stocks can be explained by a few factors.Under the assumption of factor models,the asymptotic theories of high-dimensional data in the existing literature often assume that the time-series samples are stationary,weakly dependent or even independent.And these two assumptions are often not applicable to high-frequency data.In this thesis,it is assumed that the data obeys the diffusion model,and on the premise that the volatility of the common factor in the market is proportional to the volatility of the market factor.The time change sampling scheme samples at a series of random time points,so that the factor returns asymptotically admit the independent and identical distributions.Based on time change sampling,to do statistical inference of high-frequency high-dimensional data,we need to develop new theories.Large-dimensional random matrix theory is an important branch of high-dimensional data inference theory.Under the factor structure,some researchers have developed the asymptotic theory of divergent spiked eigenvalues of high-dimensional covariance matrices based on large-dimensional random matrix theory,namely high-dimensional principal component analysis theory.However,the large-dimensional random matrix theory under the factor model needs to assume the independence between the rows or columns of the data matrix,and sampling at the calendar time often does not satisfy this assumption.Using the time change sampling scheme,the second chapter of this paper extends the large-dimensional random matrix theory under the factor model to the situation of high-frequency and high-dimensional data,and solves the problem of timeseries dependence of high-frequency data.Finally,we develop a limit theory about the divergent spiked eigenvalues of the sample covariance matrix and their corresponding eigenvectors.Based on this theory,we can study the changes in the factor structure of a basket of stocks in different periods,and do hypothesis testing on the change in the factor structure.This is the change point test of factor structure.We can also study the economic meaning of the eigenvectors corresponding to the principal component in a specific time window,and study the changes of this economic meaning over time.Covariance matrix estimation theory is another important branch of high-dimensional data inference theory.Under the factor structure,some scholars have proposed highdimensional covariance matrix estimators based on conditional sparsity.However,the existing high-dimensional estimation theories need to assume that time-series samples are stationary and weakly dependent,which are often not satisfied by sampling at calendar time.The third chapter of this thesis,based on time change sampling scheme,extends the covariance matrix estimation theory under the factor model to high-frequency and high-dimensional data,and relaxes the assumptions of time-series dependencies and stationarity.Finally,we propose a covariance matrix estimator based on time change sampling,prove the consistency,and apply this estimator to the construction of a minimum-variance portfolio of a basket of stocks.Financial assets are traded at random times,so if the sampling frequency is high,missing value may be recorded between two sampling times.Some researchers have studied the missing value imputation method based on the factor structure.Chapter 4 of this thesis presents a missing value imputation procedure for financial data under the factor structure.This method can broaden the scope of the covariance matrix estimator in practical applications and improve its robustness.Specifically,in the second chapter of this paper,we use a time change sampling scheme to obtain the data matrix under a class of high-dimensional diffusion models with factor structure,and then obtain the sample covariance matrix.We demonstrate the consistency and asymptotic normality of the divergent spiked eigenvalues of the sample covariance matrix,as well as the consistency of the corresponding eigenvectors.We constructed the ratio of eigenvalues test statistic based on the central limit theorem.The simulation results verify the correctness of the theorem.In the empirical analysis,we used the tick-by-tick data of the entire Chinese market from July 2013 to August 2021 to study three representative stock portfolios in the Chinese market that are mutually exclusive—CSI300,CSI500 and CSI 1000.We study the change of the factor structure of them in time series.In addition,we selected individual stocks that have always been in the CSI 300 constituent stocks during the sample period,and constructed balanced panel data with 112 stocks.We also study the change of the factor structure of these stocks and the economic meaning of the principal components over time.The third chapter of this paper presents the time change POET covariance matrix estimator under the same model.We prove the convergence rate of elementwise norm and relative F-norm of the error of this estimator.Additionally,we also prove the converge rate of spectral norm of the error of the inverse of this estimator.The simulations found that our estimators had smaller errors than covariance matrix estimators that did not use time change sampling.In the empirical analysis,we use the panel data of the aforementioned 112 stocks to estimate the covariance matrix of a rolling window with a window width of 125 trading days,and perform out-of-sample weekly minimum variance portfolio returns.The results show that our model has the smallest annualized volatility in the 7.67 years portfolio evaluation window.The fourth chapter of this paper,on the basis of the third chapter,proposes the "three-step" data imputation method,and proposes the projection time change POET covariance matrix estimator.Simulation studies show that the new estimator has less estimation error when there are missing values.
Keywords/Search Tags:Factor Analysis, Change Point Test, Covariance Matrix Estimation, Missing Data Imputation, High-Frequency High-Dimensional Data, Time Change, Time Series Dependency
PDF Full Text Request
Related items