| The chemical industry has made great contribution to the economic development of our country.At the same time,the flammable and explosive characteristics of chemical products cause frequent chemical accidents.The damage caused by chemical accidents to enterprises or the environment is huge.Chemical data has high professionalism and complexity,and a large knowledge density.How to quickly and accurately obtain useful knowledge from massive data,thereby reducing unsafe factors in the chemical production process and reducing the occurrence of accidents has become a difficult point in the chemical industry.Chemical information extraction is an important technical means to solve this problem,and entity relationship extraction is one of the essential technologies in chemical information extraction..This thesis conducts research on the entity relationship extraction technology in the chemical industry.The main research contents are as follows:(1)In order to obtain the chemical field text data set and chemical field entity dictionary necessary for the extraction of chemical industry entity relations,the chemical field data collection and processing subsystem was designed and implemented.Aiming at the problem of multi-source heterogeneity of relationship extraction data in the chemical industry,this subsystem implements two methods of crawling online network data and importing offline document data to collect data in the chemical industry.For online data,the subsystem can automatically crawl the web page data of chemical industry in Baidupedia,and then use XPath and Regular Expression rules to extract web page text data;for offline data,extract the data by using the data conversion operation proposed in this paper.This thesis uses the data collected by the subsystem to construct the entity dictionary in the chemical industry field and to perform relationship annotation and extraction.(2)In order to facilitate the relationship labeling of the chemical industry data,a relationship labeling algorithm based on crowdsourcing is designed and a subsystem forrelationship labeling based on crowdsourcing is implemented.The subsystem identifies the entities in the text and uses crowdsourcing to manually label,and scores the relationships between the entities according to the annotation results,and then determines whether there is a relationship between the entities according to the threshold of the entity relationship score,and stores the recognition result.The subsystem can easily label the data in the chemical industry and generate high-quality training sets.In this thesis,the performance of the annotation algorithm under different parameter thresholds is verified through experiments.By selecting an appropriate threshold,its F1 value can reach a maximum of92.26%.(3)In order to more accurately identify entity information and semantic relationship categories between entities from unstructured text,a Chinese entity relationship extraction model BiGRU-Att-PCNN based on hybrid neural network for chemical industry is proposed.In this model,BiGRU(Bi-directional Gated Recurrent Unit)is used to obtain the contextual word order related information of text sequence better;then Attention mechanism is used to automatically focus on the sequence features with high influence on the relationship;then PCNN(Piecewise Convolution Neural Network)is used to better learn the relevant environmental feature information from the adjusted sequence to extract the relationship;finally,Ranger optimizer is used to replace the original Adam optimizer to optimize.The model has achieved 85.36% F1 value in Chinese data set of chemical field,and the experiment shows that the method has good performance.(4)Based on the above research,an entity relationship extraction system in chemical industry is designed and implemented.The system realizes the functions of data collection,relationship annotation,entity relationship extraction,storage and query in the chemical industry. |