| Covariance structure for the financial data gains a lot attention in the theoretical studies.With the development of information technology and financial market,high-frequency data becomes increasingly observable.Related researches on the integrated covariance matrix are at the core of high-frequency data researches,which play an critical role in risk management,portfolio allocation and asset pricing.This paper investigates the integrated covariance matrix and mainly focus on some difficulties in high-frequency data including high-dimensional multiple transaction data,high-dimensional microstructure noise data,high-dimensional asynchronous transaction data,and high-dimensional heteroscedasticity data.Using random theory,we consider the theoretical results,hypothesis test and application of the sample matrix corresponding to the high-dimensional integrated covariance matrix under various situations.Compared with traditional low-frequency financial data,high-frequency financial data has its own independent features and research value.The sample size for the high-frequency data is quite large which provides a lot valuable information about the financial market.With the development of financial market,more and more assets are available for investment.The traditional assumption that limited assets is no longer suitable.Moreover,the microstructure noise contaminates the high-frequency data due to the short recording time interval.In order to gain useful information,de-noising method should be applied.Due to the recording mechanism and heavy trading,the phenomenon of multiple transactions is increasingly observable,that is multiple transactions often occur at each recording time interval.The transaction order is missing during two adjacent recording points.Existing literature shows that the phenomenon of multiple transactions has an impact on the theoretical results of one-dimensional financial data.The influence of the multiple transactions for the high-dimensional data needs to be further explored.Another challenge of high-frequency data analysis is asynchronous trading,which means different stocks have different transactions numbers during one time stamp.We discuss the phenomenon of multiple transactions for the high-dimensional data in Chapter 2 and Chapter 3.Another phenomenon of integrated covariance matrix is the spike eigenvalues.Empirical studies show that the top few eigenvalues of the sample matrix for the high-frequency data are much larger than the others,which provides evidence for the financial factor model.It is often believed that several larger eigenvalues contain overall important features which are critical in many fields for example the principal component analysis.We consider the spike model in Chapter 4.We derive the central limit theorem of the single sample eigenvalue and the the limiting distribution of the joint distribution for the spike eigenvalues.We also show that the sample eigenvalue is consistent with the factor part.The consistency of the eigenvectors corresponding to the spike eigenvalues is also studied.We further give the two sample test statistics to test whether the market structure has changed.Specifically,we begin with the high-frequency and high-dimensional data with multiple transactions.Using random theory,we study how the multiple transactions affect the time-variation adjusted realized covariance matrix proposed by Zheng and Li(2011).The results show that in the case of multiple transactions,the traditional theory is no longer valid.The limiting spectral distribution of the matrix not only depends on the limiting spectral distribution of the integrated covariance matrix but also depends on the number of multiple transactions.Afterwards,we consider the financial data affected by noise and multiple transactions at the same time.We make use of the pre-averaging method to deal with microstructure noise.The results show that pre-averaging method can eliminate the effects of microstructure noise and multiple transactions at the same time.Additionally,the limiting spectral distributions of the sample matrix and the integrated covariance matrix are only mutually determined by the Mar?enko–Pastur equation.Impressively,the pre-averaging method can also eliminate the impact of asynchronous transactions.The limiting spectral distribution of the sample matrix is the same as that of the sample matrix without asynchronous trading.Simulation results show that the finite sample performance of the proposed matrix is quite good.Based on the theoretical results in Chapter 2,we further consider the estimation of the integrated covariance matrix in Chapter 3.Making use of Frobenius norm,we character the distance between two matrices.We consider the class of rotation equivalent estimators and shrinkage the sample eigenvalues to make the sample matrix closer to the integrated covariance matrix as a whole and derive the optimal shrinkage function.We prove that the estimator based on the optimal shrinkage function can achieve the optimal loss function for portfolio allocation.Instead of estimating the whole shrinkage function,we use the data-splitting method and use different data to estimate the eigenvalues and eigenvectors.The results show that the proposed estimator based on data-splitting method can approach the same asymptotical efficiency with the ideal estimator.Even when the dimension is larger than the effective observation number,the proposed estimator is still asymptotically positive definite.Simulation results show that the estimator has good finite sample performance.For the global minimum variance portfolio strategy,we find the portfolio strategy based on the proposed estimator has the same exposure limit as the one based on the latent integrated covariance matrix while two strategies have the same convergence rate for the actual risk.The empirical studies show that the proposed estimator has a good out-of-sample performance in the portfolio allocation.In Chapter 4,we consider the performance of relevant statistics of the sample matrix when the integrated covariance matrix has spike eigenvalues which tend to infinity.We consider a rectangle structure of the latent log price process which includes some special factor models.We retain the heteroscedastic structure of the high-frequency data in the new model.Based on the high-frequency data,we show the consistence of the spike eigenvalues of the sample matrix.In particular,when the data comes from a factor model,the sample spike eigenvalues are also the consistent estimators of the factor part.We also derive the central limit theorem of the single eigenvalue and show the convergence of the joint distribution of the first few major eigenvalues.Further,we show that the sample eigenvectors corresponding to the spike eigenvalues are also consistent with the population eigenvectors.Based on one sample results,we consider the two-sample hypothesis test problem and give the corresponding statistics.We further apply it to the S&P 500 high-frequency data.In chapter 5,we conclude this paper and give several possible expansions of this paper mainly including the expansion of factor models,the expansion of data,the expansion of eigenvector related theories,the expansion of integrated covariance matrix with jumps,and expansion of different processing methods for multiple transactions,etc. |