Compositional Representations For Phrase Basing On Word Embedding

Posted on:2018-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:J J Wu

Full Text:PDF

GTID:2348330518996339

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Distributional representations (word embedding) which embeds words into low dimensional vectors has been widely studied in recent years. Using continuous vectors to represent words as lexical computing, especially the word similarity measure, provides a great convenience and achieves state of the art performance in many NLP applications. Therefore, more and more researchers begin to focus on how to get representations for bigger sentiment unit such as phrase, sentence and document. From the view of hierarchical structure of language, phrase structure is a direct composition of words. Therefore, it is of great theoretical and practical value to study the representation of phrase structure.First of all, as phrase structure is composed of words, the word representation is the basis of phrase representation. Secondly, the compositional structure is the most important linguistic information embeds in phrase. So it’s a key problem to study how to combine word embedding and compositional structure to learn a phrase representation. Based on this view and relevant research, our paper mainly carried out the following work.A compositional model of phrase representation based on Autoencoder is proposed. This model can be used to train phrase representations with different grammatical categories and embeds grammatical information into their vectors.In this paper, a variety of experiments are conducted on phrase representation of both Chinese and English, and some of general conclusions are obtained,including: Firstly, the vector representation of Chinese and English phrase learned from this model differs according to their grammatical category, as a result, the vector representations of phrases with different grammatical categories tend to be clustered together in vector space; Secondly, phrase vectors learned from this model is a composition of word vectors, which is expressed in vector space in a particular direction, and different composition of grammatical categories lead to different directions. These results prove our model can embeds phrase structure information into phrase representations.We also apply these phrase representations to several natural language processing tasks such as phrase emotion classification,phrase similarity and text sentiment analysis. Experiments show that the phrase representation achieves better performance in task lik phrase emotion classification, since this task request for more structure information. and still needs more improvement in semantic representation,which is the future work we focus on.

Keywords/Search Tags:

phrase vector, Autoencoder, word embedding

PDF Full Text Request

Related items

1	Research On Disambiguation Of Chinese Words And Phrases
2	The Research On Topical Phrase Mining For Patent
3	Research On Keyphrase Extraction Algorithm Based On Word Embeddings Learning
4	Multi-prototype Word Vector Based On Context Word Embedding
5	Rule-based And Statistical-based Combination Of Bilingual Parallel Sentence, The Phrase Alignment Method
6	Methods For Phrase-based Text Mining And Analysis
7	Topic Modeling Research Based On Word Embedding And Generative Neural Networks
8	Research On Hybrid Recommendation Algorithm Enhancement By Stacked Denosing Autoencoder And Users' Labels
9	Methods Of Phrase-based Question Understanding And Answer Generation
10	Chinese Prepositional Phrase Recognition Based On Fine-grained Phrase Information