Font Size: a A A

Character Relationship Extraction And Classification For Chinese Literary Works

Posted on:2023-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:R TaoFull Text:PDF
GTID:2555307061450334Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
There are many potential risks of cyberattacks on the rapidly developing Internet,which can pose a threat to the data and privacy security of users,enterprises,and governments.It plays an important role in the construction of network security-related knowledge graphs,to extract entity relations from massive security intelligence data,thus becoming the hotspot in the field of information extraction.Entity relation extraction is one of the research topics in the information extraction field,the task of which is to extract the semantic relations between entities contained in natural language texts and format them into structured relation triples.Person relationship extraction technically is a fine-grained branch task of the entity-relationship extraction and is important for downstream tasks,such as knowledge graphs of character relationships and relational reasoning and answering.The text for character relationship extraction can be sentences,paragraphs,or a complete story.Previous studies mostly focused on English phrase corpus from news or encyclopedias,lack of Chinese literary works with large lengths and complex character relationships.Novels are one of the representative formats of literary works.Their complex character relationships and the ups and downs of the plot evolution are similar to the criminal statements.Therefore,we can apply the technology of character relationship extraction for novels to the criminal investigation,by constructing the person relationship network in the case to intelligentize the law enforcement process.Based on the above analysis,this research will conduct character-relationship extraction for Chinese literary works,the specific tasks of which are as follows:(1)For the lack of data sets for character relationship classification in Chinese literary works,this study collects the Chinese novel "Ordinary World" as the text data,and constructs a qualified data set through sentence segmentation,name recognition,sentence selection,and tag matching.(2)Since the noisy sentences existing in the literary text cannot reflect the relationship between characters,inspired by generative adversarial networks,this study introduces an adversarial learning framework to train a sentence-level noise classifier to remove noise in the dataset.(3)A multi-feature person relationship classification model,called MF-CRC,is designed.Specifically,the vector encoded by the pre-trained language model BERT is sent to the Bi LSTM model,to obtain the semantic feature of the sample sentence in depth.Besides,the Bayesian classification algorithm is used to identify the character genders in sample sentences,and the name features can be obtained according to whether the two target characters share the same last name.After that,with the introduction of the relationship indicator table,we can get the relationship indicator features of the sample sentences through semantic similarity matching.The above three features are concatenated finally to train the person relationship classification model.(4)Based on the dataset constructed in this work and the proposed model,comprehensive contrast and ablation experiments are designed.The results show that our model outperforms the other baselines,reflecting the effectiveness of the model framework.In addition,the extracted character relationships are further processed by the proposed RSP algorithm for relationship applicability.Finally,a character relationship extraction system for Chinese literary works is implemented,and its output is a visual graph of character relationships.
Keywords/Search Tags:chinese literature, character relationship extraction, name feature, relationship indicator feature, BERT
PDF Full Text Request
Related items