| Distributional representations (word embedding) which embeds words into low dimensional vectors has been widely studied in recent years. Using continuous vectors to represent words as lexical computing, especially the word similarity measure, provides a great convenience and achieves state of the art performance in many NLP applications. Therefore, more and more researchers begin to focus on how to get representations for bigger sentiment unit such as phrase, sentence and document. From the view of hierarchical structure of language, phrase structure is a direct composition of words. Therefore, it is of great theoretical and practical value to study the representation of phrase structure.First of all, as phrase structure is composed of words, the word representation is the basis of phrase representation. Secondly, the compositional structure is the most important linguistic information embeds in phrase. So it’s a key problem to study how to combine word embedding and compositional structure to learn a phrase representation. Based on this view and relevant research, our paper mainly carried out the following work.A compositional model of phrase representation based on Autoencoder is proposed. This model can be used to train phrase representations with different grammatical categories and embeds grammatical information into their vectors.In this paper, a variety of experiments are conducted on phrase representation of both Chinese and English, and some of general conclusions are obtained,including: Firstly, the vector representation of Chinese and English phrase learned from this model differs according to their grammatical category, as a result, the vector representations of phrases with different grammatical categories tend to be clustered together in vector space; Secondly, phrase vectors learned from this model is a composition of word vectors, which is expressed in vector space in a particular direction, and different composition of grammatical categories lead to different directions. These results prove our model can embeds phrase structure information into phrase representations.We also apply these phrase representations to several natural language processing tasks such as phrase emotion classification,phrase similarity and text sentiment analysis. Experiments show that the phrase representation achieves better performance in task lik phrase emotion classification, since this task request for more structure information. and still needs more improvement in semantic representation,which is the future work we focus on. |