Lexeme Extending For Distributed Representations Based On Knowledge Base

Posted on:2018-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:J C Chen

Full Text:PDF

GTID:2348330518995431

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years, distributed representations have been widely used in natural language processing tasks such as POS tagging, machine translation, word sense disambiguation, and have shown the superiority of their effect. However, word embeddings, which are trained through unsupervised learning on the large-scale text corpus, are difficult to cover the perfect annotation information in knowledge base and semantic dictionary, and the entity relations and unique hierarchical structure of knowledge base are certainly important in NLP tasks. How to integrate distributed representations and knowledge base effectively to promote word embeddings, has aroused heated discussion in research, so as to solve the problem of polysemy, lexeme (a certain sense of a word) extending and so on. Our paper is also based on this task.Based on the autoencoder framework in AutoExtend[18]model, a new semi-supervised and hierarchical word embedding model is proposed to fuse the knowledge base and the unsupervised word embedding, ending up in optimizing effects of polysemy, expansion and other tasks. Although AutoExtend model and related research have achieved some success, they only utilized limited knowledge base entity relations and were only valid for in-vocabulary terms. So in this paper we make full use of knowledge base features to implement and propose two models, to improve the word and the lexeme embeddings with the guarantee of efficiency and parallelization. The experiment is carried out through tasks like word similarity, word expansion and named entity recognition, which has shown that our model performs better. The main contributions of this paper are displayed as follows:(1) A model named RetroExtend is achieved based on semantic dictionary through semi-supervised learning, which is to improve in-vocabulary word and lexeme embeddings. The model follows the autoencoder framework in AutoExtend, analyzing the entity relations among words and lexemes in the encoding-decoding process through graph-based learning.(2) A model named OOVExtend is proposed based on hierarchical structure through semi-supervised learning, which is to solve the polysemy problem of out-of-vocabulary words and to extend corresponding lexeme embeddings. In this model we first utilize the hierarchical structure of the knowledge base to match the most close synsets in the knowledge base for a certain OOV word, and then use the synset embeddings learned by the RetroExtend model to calculate the OOV lexeme vectors, which is realized by learning weight matrix in minimizing the reconstruction loss in the decoding process of transferring the matched synset vectors to the original word vectors.(3) Based on the models of this paper, we combine the knowledge base like WordNet, PPDB and corpus like Wikipedia and GoogleNews to solve tasks like word similarity, word expansion and named entity recognition. The results are evaluated on standard datasets like WS353,SCWS, with the method of AvgSim and AvgSimC. Compared to the existing models, the experimental results show that our model performs better. For example, Spearman correlation is improved by 2%～3% in terms of the word similarity task. In sum, our algorithms are feasible and can promote the performance of solving the issues such as polysemy and lexeme extending.

Keywords/Search Tags:

word embedding, knowledge base, lexeme extending

PDF Full Text Request

Related items

1	Automatic Construction Method For Domain Concepts Based On Wikipedia Semantic Knowledge Base
2	Construction Of Knowledge Base Disambiguation Knowledge Base Based On Multi - Knowledge Source
3	Research On Natural Language Question Answering Based On Knowledge Bases
4	Combing Knowledge And Neural Network For Text Representation
5	Entity Linking Based On Knowledge Base
6	Research On The Representation Of Word Embedding Based On Knowledge Fusion
7	Research On Ontology Alignment Based On Word Embedding
8	Research Of Collective Entity Linking Based On Joint Embedding Of Word And Entity
9	Research On Web Service Discovery Based On Knowledge Map And Word Embedding
10	Research And Realization Of Several Key Techniques For Knowledge Input System Based On Agriculture Knowledge Base