| Chinese medical records contain rich medical information,which is an important resource for disease prediction,clinical diagnosis assistance,drug mining,and other applications.However,the majority of medical records are unstructured text data that is difficult for computers to understand.Although deep learning models have made significant advances in relation extraction tasks in recent years,they are not well suited for practical scenarios.When faced with a continuous data stream,directly training models on new relationship categories can lead to "catastrophic forgetting" of old knowledge.If all electronic medical records are re-annotated,it would require a significant cost due to the unique medical terminology and special symbols present in these records.Chinese medical records are relatively scarce,and some medical records may not be available due to copyright or privacy issues.To address these problems,this paper proposes an incremental relation extraction method for medical records.The main contributions of this article can be summarized as follows:(1)Due to privacy concerns,some of the medical record relationship data cannot be used.Additionally,there is an imbalance in the distribution of relationship types in the data,with common disease samples far outnumbering rare but important diseases,which leads to the problem of model overfitting on a small number of old relationship categories.To address this issue,distillation loss is used to help the model retain its discriminative ability for old relationship categories even with limited data.Furthermore,attention loss is used to further prevent bias in learning new relationship data due to the imbalance of old and new relationship data.(2)The learning order in incremental relation extraction is also important.If there is a large difference in the feature distribution between the previous and current relationship clusters,it may be difficult for the model to fit the new relationship category data.To address this issue,a task ordering module is designed to calculate the similarity between each relationship cluster.By sorting the relationship clusters based on their similarity,the influence and interference of old knowledge can be minimized,allowing the model to quickly adapt to learning new relationship data.(3)As the number of relationship categories in incremental learning increases,the number of categories with similar feature distributions also increases,which can easily cause confusion for the model.To address this issue,a bias verification loss function is designed to improve the model’s discriminative ability for similar relationship categories.In order to validate the effectiveness of the proposed method in incremental relation extraction tasks,experiments were conducted on three publicly available datasets in this paper.The experimental results indicate that the proposed method achieved average accuracies of 82.2%,73.2%,and 67.7% on the Lifelong Few Rel,Lifelong TACRED,and Lifelong CMe IE datasets,respectively,exhibiting improvements of 0.7%,0.8%,and 1.5%compared to the best baseline methods. |