Font Size: a A A

Research On Prediction Of Return On Assets Based On Ensemble Learning

Posted on:2024-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZhangFull Text:PDF
GTID:2569307052987369Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
The core issue in asset return prediction is to achieve a reasonable pricing of assets.Traditional asset pricing models,built upon Markowitz’s mean-variance theory and supplemented by the arbitrage pricing theory(APT),have developed into an effective research framework over several decades.However,with the advent of the big data era,traditional asset pricing research faces two urgent problems:(1)an increasing number of anomalous factors have emerged,forming a "factor zoo";(2)traditional linear models(OLS)fail to effectively identify the nonlinear relationships among anomalous factors.Machine learning,especially the boosting family algorithms,such as XGBoost,Light GBM,and Cat Boost,which have shown outstanding performance in various task scenarios,inherit the essence of machine learning for prediction and can handle high-dimensional input data well.The tree structures within these algorithms add nonlinear recognition characteristics to the model through "branching",making them suitable as alternative tools for asset pricing research.Therefore,combining integrated learning with asset pricing research has significant theoretical and practical implications.As a graduate student in quantitative economics,this is a part of the abstract for my thesis.This article focuses on the following aspects of analysis: Firstly,it summarizes and synthesizes the traditional asset pricing theory and the integrated learning theory,and proposes a theoretical research framework combining the two.Analysis shows that the optimal combination point between the integrated learning model and the traditional asset pricing model is in the module of predicting future asset returns.Therefore,this article uses a set of 167 features as input and employs integrated learning algorithms to predict cross-sectional asset returns.The empirical section then focuses on examining and analyzing the results of integrated learning predictions from two perspectives.One perspective is to examine the performance of integrated learning in the A-share market,answering questions about whether integrated learning has predictability in the A-share market and whether the nonlinear nature of the integrated learning model can improve investment performance.The other perspective is to study the economic interpretation of the integrated learning prediction results.Finally,corresponding research suggestions are proposed based on the empirical results.As a graduate student in quantitative economics,this is a part of the abstract for my thesis.The main conclusions of this study are as follows:(1)The ensemble learning models,including XGBoost,Light GBM,and Cat Boost,have predictability in the A-share market.The long-short investment portfolios constructed based on the prediction results of these models obtained significant returns.Moreover,after being tested with the Fama French three-factor and five-factor models,they also achieved significant excess returns.After controlling for other factors in the A-share market,the pricing effect of the ensemble learning models remains significant.Among the three models,Cat Boost performed the best,with a monthly return of 3.14% for the equally weighted long-short portfolio and 2.54% for the market value-weighted portfolio.(2)Two baseline models were set up for comparison: the first is the linear regression model(OLS)commonly used in traditional research,and the second is the single-factor model constructed with the best-performing factor among all factors(MACD60,a factor constructed with the 60-day moving average indicator).By comparing the ensemble learning model with the two baseline models,the monthly returns of the ensemble learning model were generally higher than those of the two baseline models.The monthly returns of Cat Boost,Light GBM,and XGBoost were48%,34.9%,and 22.6% higher than that of the linear model(OLS),respectively,and57%,43%,and 30% higher than that of the single-factor model,respectively.(3)The economic importance indicated that Cat Boost can still obtain significant excess returns after considering transaction costs.The factor importance analysis showed that Cat Boost preferred to select factors related to company quality,technology,and sentiment constructed from market trading data.Finally,the analysis of mispriced stocks showed that Cat Boost had better predictability for low-volatility,high-liquidity "blue-chip stocks," and thus obtained excess returns.Therefore,advanced machine learning techniques should be utilized to study the non-linear relationship in asset returns research,improve the model’s effectiveness,establish relevant explanatory frameworks,and promote the understanding of the intrinsic logic in financial markets.
Keywords/Search Tags:Ensemble learning, asset pricing, return prediction, ensemble factor economic interpretation
PDF Full Text Request
Related items