| Under the guidance of the goal of “preventing and defusing major risks”,risk prevention and control work in the financial field has been carried out in an orderly manner,and defusing capital market risks is one of its important contents.Since the outbreak of COVID-19,factory closures,debt defaults,litigation and other incidents have increased significantly.In the face of the complex and volatile economic situation,it has become a major practical issue to fully identify,prevent and defuse the risks of listed companies.Financial distress prediction of listed companies has always been the focus of academic attention,scholars are committed to exploring different indicators to improve the accuracy of financial distress prediction.Existing studies have shown that financial indicators reveal the financial status of listed companies and can accurately predict financial difficulties to a certain extent.In recent years,the rise of natural language processing technology makes some text indicators also taken into consideration,and has proved the usefulness of text information.However,the research on text information is still in the exploratory stage,and the research on the contribution of text information to financial distress prediction is rarely involved.Based on this,this paper takes the litigation announcement text of listed companies as the starting point,constructs a comprehensive index system of financial distress prediction that integrates financial indicators and text indicators,and investigates the predictive value of litigation announcement text by comparing the prediction effect of pure financial indicators and comprehensive indicators.This paper will be defined by ST(*ST)as A company in financial distress,taking 4,308 non-financial listed companies in A-shares from 2012 to 2020 as the research object,including 3,820 normal companies and 488 companies in financial distress.Based on the data of T-2,T-3 and T-4 phases,it is predicted whether it will fall into financial difficulties in T year.This paper focuses on two issues: first,the selection and construction of forecasting indicators,including financial indicators and text indicators;Second,the construction of prediction model.First of all,this paper selects financial indicators by referring to previous literature,and selects 32 financial indicators that reflect the company’s debt paying level,profitability level,operation level,cash flow level and growth ability.Then the principal component analysis method is used to achieve feature selection and dimensionality reduction,and finally13 financial indicators are extracted,which overcomes the problem of slow training speed and over-fitting effectively.Secondly,in terms of the selection of text indicators,based on the text corpus of 13730 litigation announcements,this paper adopts the dictionary method to achieve feature extraction and index quantification.The“information content” index is constructed according to the total number of words in the text,and the “uncertainty” index is constructed by combining the domestic Hownet emotion dictionary and the foreign LM emotion dictionary.The readability index is constructed based on THUOCL dictionary of Tsinghua University,which is used to measure the text of litigation announcement.Finally,in terms of prediction model construction,in order to make the research more objective and convincing,this paper chooses random forest algorithm and logistic model,which are the mainstream methods in the academic world,for prediction and comparative analysis,and improves the performance and generalization ability of the model with the help of grid search and 10-fold cross-validation method.By comparing the prediction effect of pure financial index and comprehensive index with text index,the paper investigates whether the text of litigation announcement has incremental value.The research results of this paper show that the prediction ability of random forest algorithm is better than that of logistic model.However,no matter which method is based on,the addition of text indicators can improve the prediction effect of the model to a certain extent,and when information sources are scarce,text indicators can play a greater marginal contribution,which indicates that the litigation announcements disclosed by listed companies are not template information,but have information content and use value for financial distress prediction,and can be used as an effective supplement to financial data.At the same time,the study found that compared with the other two text indicators,the incremental contribution of "readability" index is greater,focusing on the mining and analysis of the "readability" of litigation announcement text can better warn the financial risks of listed companies.Therefore,full attention should be paid to the use value of text information such as litigation announcement,efforts should be made to improve the disclosure system of major litigation matters of listed companies,and improve the voice and contribution of text information to financial forecasting.The research of this paper has important theoretical and practical significance.In theory,this paper proposes to investigate the incremental value of textual indicators to financial distress prediction from the perspective of litigation,which not only broadens the research perspective of financial distress prediction,but also provides evidence for the usefulness of text information.In practice,this paper affirms the predictive value of litigation announcement information of listed companies,at the same time improve the prediction precision of the financial difficulties.The research conclusion of this paper is of great significance to the risk prediction and prevention of listed companies,at the same time,it also provides useful information for investors’ investment behaviors,urges them to pay more attention to the litigation-related matters of listed companies,enhances their awareness of risk prediction and ability to distinguish truth from falseness,and improves the scientificity and effectiveness of decision-making. |