| Enterprise risk refers to a series of possible outcomes that may directly or indirectly affect the profits of a company due to various uncertain factors such as market competition,macroeconomics,legal policies,internal or external operations,etc.Enterprise risk includes various types,such as competition risk,operational management risk,product technology risk,and economic risk,and each type contains a collection of risk terms,forming a set of risk vocabulary.Accurately identifying risk terms can help companies identify hidden risks,analyze the conditions for risk occurrence,and trace risk sources to better manage risks.In addition,risk term recognition can also help investors timely grasp and evaluate a company’s risks and formulate risk investment strategies.Therefore,enterprise risk term recognition has important practical significance.Currently,research on enterprise risk term recognition is still in the early stages of development,and lacks mature domain datasets.At the same time,enterprise risk term corpora are highly specialized,encompassing a large number of proprietary terms and domain vocabulary,and have complex word formation rules and diverse expressions.In response to these issues,this paper conducts related research in the field of enterprise risk,with the following specific research content:(1)In response to the current lack of mature named entity recognition datasets in the enterprise risk field,this paper collected information industry corporate annual reports and conducted preliminary work such as text selection,format conversion,and risk field extraction.It also preprocessed and annotated unstructured text to transform it into structured text,and constructed the Enterprise risk terms Named Entity Recognition dataset(ErtNER),filling the gap in this research field and laying the data foundation for subsequent research.(2)In response to the characteristics of enterprise risk term corpora,this paper uses deep learning algorithms to construct a neural network model based on the scaled dotproduct attention mechanism combined with an auxiliary classification layer to identify risk terms.It also designs comparative and ablation experiments to test the performance of the model.The experimental results show that the precision,recall and F1 values of the model on the ErtNER dataset are 90.56%,92.34% and 91.44%,respectively,which are 1.2%,0.98% and 1.09% higher than the baseline model RoBERTa-wwm+CRF,proving that the model has strong performance on the ErtNER dataset.(3)In response to the problem that the Long Short-Term Memory network can only capture temporal information and cannot fully extract text features,this paper introduces the improved Gated-Dilated Convolutional Neural Network to remedy the inadequacy of a single model’s feature extraction ability.It also uses the bilinear multi-head attention mechanism with stronger feature fusion ability to dynamically fuse multi-dimensional text features to prevent feature leakage as much as possible.Moreover,this paper designs comparative experiments from multiple angles to test the performance of the model.Experimental results show that the dual feature extraction model has superior recognition performance on risk terms compared to a single model. |