Font Size: a A A

Terminology Extraction For New Energy Vehicles Based On Deep Learning

Posted on:2019-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhangFull Text:PDF
GTID:2392330623969003Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
According to the survey,since 2013,China's new energy vehicle sales continue to rise.Based on the investigation of New Energy automobile industry,this paper finds that there is a lack of research work on the construction of domain terminology thesaurus in the field of new energy vehicles.Considering that the patent text has practicality and novelty,it is one of the most valuable carriers of science and technology information,and any industry innovation is bound to be embodied in the patent.Therefore,this paper aims to establish a new energy vehicle domain terminology database based on the new energy vehicle domain patent.At present,the method of extracting terminology in specific domain is mainly based on language rules,statistical methods and their combination,and the traditional method of extracting new energy vehicle terminology mainly has the following problems: the inaccurate word segmentation leads to the increase of the noise of text mining,the complexity of the text representation,and the weak ability to find words with nested structure.In view of the above problems,this paper presents a new energy vehicle domain terminology extraction model based on deep learning.In this paper,the problem of terminology extraction in the field of new energy vehicle is transformed into sequence tagging.In this paper,the term extraction model of cascaded conditional random field(Cascaded Conditional Random Fields,CCRF)and the term extraction model of BLSTM_ATTENTION_CRF domain are proposed to extract the domain term extraction.The main innovative points of this article are as follows:(1)The CCRF model introduces two-layer conditional random field(Conditional Random Fields,CRF),the two-layer model defines annotation system and feature template separately from character level and word level,which can effectively solve the problem of inaccurate word segmentation.(2)This paper presents a BLSTM_ATTENTION_CRF model based on depth learning.The model uses Word embedding model to express the text of the new Energy vehicle patent text and reduce the data representation dimension.And the model uses bidirectional long time Memory network(Bi-directional LSTM,BLSTM)to excavate implicit features.And the model uses the attention mechanism to obtain the semantic encoding of the node's attention probability distribution in the input sequence,thus reducing the loss of feature information and the redundancy probability in the tagging process.Finally,after adding the CRF model to the softmax layer of the model,the model can effectively consider the dependence of the output tag and improve the efficiency of extracting the nested structure words.(3)In order to further improve the accuracy of domain terminology extraction,this paper proposes a correction model based on dictionaries and rules,which can effectively solve the problem of terminology with nested and compound structures.The experiment shows that the term extraction model of the new Energy vehicle domain based on depth learning can extract most of the domain terms,most of which contain nested structure of domain terminology can be identified.The domain term extraction model proposed in this paper can achieve better recognition effect than other models.
Keywords/Search Tags:domain terminology extraction, Attention mechanism, bidirectional long and short time memory network, conditional random field, dictionary, rules
PDF Full Text Request
Related items