Font Size: a A A

Research On Distributed Representation Learning Of Chinese Word

Posted on:2015-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HouFull Text:PDF
GTID:2295330461983862Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Word representation is one of key issues in natural language processing. It is important prerequisite for architecture of syntactic and semantic analysis models. And it also affects the accuracy and robustness of many NLP application systems, including information retrieval system and question-answering system. Furthermore, when processing large-scaled real-world Chinese dataset, word representation method plays a key role in efficiency and performance of a system.There are three kinds of word representation strategies we focus on in this thesis:one-hot representation, distributional representation based on latent semantic information and distributed representation based on neural language model. The one-hot representation is most widely used in Chinese information processing. Many Chinese chunking systems, which are based on maximum entropy model and conditional random fields, introduce word-related one-hot vectors as features. This representation is simple but high-dimensional, and the corresponding feature matrix is very sparse. In order to make up for this shortcoming, the latter two representation strategies map words into low dimensional real-valued vectors. The difference between the latter two representations is that the distributional representation mainly employs some matrix decomposition techniques, while the distributed representation regards the word vectors as a hidden layer in neural network.We focus on the word representation strategy based on neural network. We only care about the neural language model proposed by Bengio(2003) and conduct some numerical simulations based on a large Chinese corpus of 5 million characters, which is processed manually by Shanxi University. The simulations show that maximum element and minimum element in the matrix of the distributed word representation become larger and lower respectively when iteration number increases. This phenomenon is in line with the results derived in Turian(2010). Furthermore, we analyze this phenomenon theoretically, and give a sufficient condition of unbound state of the matrix.In this thesis, we also study connections between the distributed word representation and word meanings. By drawing histograms of vectors of some typical English and Chinese polysemous words, we preliminarily assert that the more ambiguous the word is, the more peaks the histogram has. And we observe that both Chinese words and English words show the similar trend.In order to compare the distributional representation and the distributed representation, we conduct some word-clustering experiments with two representations. The experiments illustrate that the distributed word representation can find more precise ten neighbors for a word than the distributional word representation.The one-hot representation and the distributed representation are compared in boundary identification task of Chinese Base-Chunk. The results show that F value is 38.72% when using sliding-window word features of size [-2,2], and the F value is up to 70.51% when replacing original one-hot word features to the distributed word features; When we scales the distributed word features, the F value can achieve 70.74%. After we further introducing Part-Of-Speech features of window size [-2,2], the F value is 82.35% and 85.90%, which are corresponding to the one-hot word feature and the distributed word feature respectively. These results indicate that the distributed representation of Chinese words has positive effects on the identification task of Chinese Base Chunk.
Keywords/Search Tags:word representation, neural language model, distributed word representation, Chinese Base Chunk
PDF Full Text Request
Related items