In the past 30 years,primary lung cancer(PLC)has been the fastest-growing malignant tumor in China.With the development and maturity of thoracoscopy technology,significant progress has been made in early lung cancer surgery,but postoperative complications are still difficult to completely avoid.Taking proactive preoperative and postoperative monitoring for high-risk populations can significantly reduce the incidence of complications.However,risk monitoring needs to ensure the timely and accurate acquisition of structured data of relevant risk factors.Therefore,this paper designs and implements a text structured system of lung cancer cases based on deep learning.At present,scholars have conducted relevant research on case text structured systems,but there are still the following challenges.Firstly,structured attributes are complex and require the combination of professional knowledge for attribute sorting; Secondly,the sample size of data mainly based on electronic medical records is insufficient,making manual annotation difficult; Thirdly,it is necessary to address the semantic gap between text within the domain and general text; Fourthly,the characteristics in the clinical field require high predictive performance of the model.In response to the above challenges,the corresponding solutions proposed in this article are as follows:Firstly,in response to Challenge 1,this article uses attribute grading to locate a few fields that cannot be accurately extracted from over 50 attribute fields in 5 major categories involved in pathological diagnosis,and maps them to text classification and sequence annotation tasks in the NLP domain to simplify attribute complexity.In response to Challenge 2,this article identifies a multi task learning paradigm that integrates relevant task information through hard parameter sharing,making full use of small samples; We constructed three datasets and applied ”noise” and ”sampling” data augmentation methods to increase the number of small sample classes; A ”rule+manual”annotation method has been proposed to reduce the difficulty of annotation.Secondly,in response to Challenge 3,this article proposes a text structured joint model Multi BGLC,which uses BERT pre trained from general corpus as the encoder.Based on lung cancer case data,fine-tuning is performed on a decoder composed of GCNN+LSTM+CRF to narrow the semantic gap.GCNN is used for attribute discrimination,while LSTM and CRF are used for attribute extraction.In response to Challenge4,this article designs ablation experiments to explore the impact of different technical schemes on the predictive performance of the model.The results showed that data augmentation significantly improved the model when the sample distribution was imbalanced,and the diversity of ”sampling” was better than that of ”noise”,with an average increase of 6.47 percentage points in the macro average F1 value; As both centralized and distributed word embedding methods,the prediction performance of Word2 Vec encoder is significantly lower than that of BERT and ERNIE-3; The decoder combination of GCNN+LSTM+CRF has significantly better prediction performance than the other five decoders,and the average F1 value of the macro has increased by 7.05 percentage points on average.In summary,the Multi BGLC model using the ”sampled class”data augmentation method has the best performance,with macro average F1 values of95.15%,97.59%,97.89%,and 99.91% in four fields,respectively.This experiment verifies the effectiveness and accuracy of the method.Finally,this article implements a structured text system for lung cancer cases.This article comprehensively evaluated the performance of the experimental model in terms of parameter quantity,training time,inference time,etc.,and selected the most suitable model for interface invocation,and save the parameters of the model.At the same time,this article uses JS,CSS,and HTML to design the front-end and back-end call logic,making the model usable through software.The method proposed in this article can effectively help doctors better understand and analyze lung cancer case text data,providing assistance for clinical practice and medical decision-making. |