Font Size: a A A

Research On Keytechnologies For Constructing Chemical Dangerous Goods Knowledge Graph Based On Deeplearning

Posted on:2024-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:B L WangFull Text:PDF
GTID:2531307139476554Subject:Materials and Chemical Engineering (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology in the field of chemical industry,there is a growing need for personnel in the chemical industry to quickly acquire a large amount of knowledge related to dangerous chemicals.The difficulty lies in the fact that the source knowledge is complex and redundant,which makes it difficult to extract.In order to extract knowledge about dangerous chemicals,this article uses knowledge graph construction technology to organize and systematize knowledge about dangerous chemicals,achieving a transformation from unstructured to structured knowledge.Currently,there are relatively few datasets and related research on dangerous chemicals,making it difficult to extract knowledge from unstructured text and build a knowledge graph of dangerous chemicals.This article mainly focuses on the production of a dangerous chemical dataset,named entity recognition,relation extraction,and other technologies to construct a dangerous chemical knowledge graph system.The main achievements are:(1)This article constructs a dataset of dangerous chemical texts,which can be divided into two parts: dataset acquisition and dataset labeling.In terms of dangerous chemicals acquisition,web crawlers and OCR technology are mainly used for character recognition,and several filtering rules are designed to clean the data.For dangerous chemical text data labeling,Brat tools and YEDDA are used for data labeling,and a total of 3,942 entities and 1,756 relationships between entities are labeled.(2)For the named entity recognition task,based on the concept of word grids proposed by FLAT,this article proposes embedding part-of-speech(POS)information into word grids and using convolutional gangs instead of traditional convolutional operations.This idea is implemented on the Lexicon-Cross-Transformer model,and then this article combines the advantages of CRF and Transformer to construct a Lexicon-Cross-Transformer+CRF model on the dangerous chemical dataset.After multiple in-depth experiments,the results show that the Lexicon-CrossTransforme+CRF model can achieve better named entity recognition results than other mainstream models on the dangerous chemical dataset while considering time complexity.(3)Word grids can help the model extract information about word boundaries.This article proposes embedding POS information into word grids to enrich the information.Words themselves not only have semantics but also contain POS information.POS has certain distinguishability and similarity,and entities with the same POS have more similar properties,while entities with different POS have larger distinguishability.For example,the relationship between nouns is more related than the relationship between verbs and adjectives.(4)The radicals of Chinese characters contain certain characteristics.Based on the MECT model,this article improves the CNN for extracting radicals.A single CNN has certain limitations in feature extraction and may extract unimportant information,thus ignoring important information.This article replaces a single CNN with a convolutional gang that includes three types of convolution: traditional convolution,dilated convolution,and 1D deformable convolution.Features are extracted at multiple levels to enhance radical feature extraction.
Keywords/Search Tags:chemical dangerous goods dataset, Entity recognition, Relation extraction, Dilated convolution, Deformable convolution
PDF Full Text Request
Related items