Font Size: a A A

Research On Hierarchical Multilabel Classification For Chinese Lexicon Based On Word Embedding

Posted on:2022-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:L X YuanFull Text:PDF
GTID:2545306839491344Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advent of deep learning and big data technology,artificial intelligence technology has developed rapidly.Semantic classification is a basic task of Natural Language Processing(NLP).According to the granularity,it can be divided into lexical classification,short text classification,and text classification.At present,the research on semantic classification mainly focuses on short texts and text classification,while the research on lexical classification is still insufficient.However,lexical classification can play an important role in NLP tasks such as word sense disambiguation and information extraction.Therefore,this paper studies Chinese lexical classification.In order to better represent word meaning,an external knowledge base is introduced to build a lexical classification model for Chinese Linguistic Inquiry and Word Count(LIWC)and carry out research on automatic classification for Chinese lexicon.Because of the limited information of Chinese lexicon,this paper studies the effectiveness of conventional machine learning algorithms such as collaborative filtering and matrix decomposition for lexical label prediction.Collaborative filtering algorithm mainly models the relationship between words and words and between words and classification labels.The model is simple,while the effect is excellent.In order to learn the relationship between classification tags,this paper proposes a matrix decomposition algorithm In this algorithm,the co-occurrence matrix of classification labels is constructed according to the dependency relationship among classification labels,which inspired by the idea of Glove algorithm.Results show that the model can use the dependency relationship between labels and improve the accuracy of Chinese lexical classification.In this paper,Seq2Seq model is also introduced to study the effect of deep learning on Chinese lexical classification task.In order to make use of deep learning,this article introduces knowledge bases such as How Net and Modern Chinese Dictionary.When categorizing the different meanings of words,combined with the Attention mechanism,the model can accurately use the sememe,dictionary definitions of the knowledge base.In order to learn the relationship between classification labels accurately,this paper introduces Conditional Random Field(CRF),which inspired by the named entity recognition task.Experiments are designed to study the influence of CRF layer and external knowledge base on classification accuracy.The results show that the hierarchical multilabel classification accuracy of Chinese lexicon is greatly improved by using external knowledge base,Attention mechanism and CRF.The results show that the proposed method is effective in Chinese lexical classification.This paper gives the idea and method of Chinese lexical classification based on How Net and Modern Chinese Dictionary,which provides some theoretical significance and reference value for automatic Chinese lexical classification.
Keywords/Search Tags:Chinese lexical classification, machine learning, Seq2Seq model, knowledge bases
PDF Full Text Request
Related items