The Research On Identification Of Chinese Varieties In The Greater China Region

Posted on:2021-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Sun

Full Text:PDF

GTID:2415330620468764

Subject:Intelligent information processing

Abstract/Summary:

PDF Full Text Request

Automatic language recognition is the first step in language processing and language understanding.Accurately detecting the language used in a document is the key step in many natural language processing tasks,such as automatic text classification,machine translation,and multilingual data collection.In recent years,with the advance of research on automatic language recognition,different kinds of languages can be detected with high recognition rates.Since language resources are relatively lacking,and the distance between any two languages is relatively close in language variants,automatic language recognition in language variants is still a challenging task.Due to the influence of region,history,culture,social environment,etc.in the greater China region,there are differences in vocabulary,grammar,and pragmatics of Chinese used in various regions,which are variants of generalized modern Chinese.Different from traditional linguists viewpoint,this article focuses on the research of Chinese variant recognition in the greater China region from the perspective of computational linguistics and natural language processing,and analyzes the difference among these Chinese variants in the greater China region.The main research contents are two-fold as follows:(1)Construction of Chinese variation recognition model in the greater China region by integrating with the classic text classification modelsThis paper proposes to integrate the classic text classification methods,including traditional machine learning approaches and deep learning-based models.Specifically,we adopt a majority voting algorithm to build a new Chinese variant recognition model in the greater China region,and apply the model to the news article in the greater China region.Experiments were conducted on the captured categorical corpus data sets.The experimental results show that the Chinese variant recognition model constructed in the greater China region can synthesize the advantages of a single model to obtain better performance.(2)Construction of Chinese variation recognition model in the greater China region based on SENet(Squeeze-and-Excitation Networks)attention mechanismInspired by a single classic text classification model that incorporates the attention mechanism,this paper constructs a recognition model for Chinese variants in the greater China region based on the SENet attention mechanism,and uses the SENet attention mechanism to capture the differences among Chinese variants in the greater China region.It can increase the weight of important discriminative words dynamically.Meanwhile,the original word vector features are also incorporated in the training process.Compared with the classic text classification method,the recognition effect of the Chinese variant recognition model based on the SENet attention mechanism in the greater China has been significantly improved.A detailed visualization analysis of the experimental results also verifies the effectiveness of the attention model.

Keywords/Search Tags:

Language Identification, The Greater China Region, Chinese Variants, Ensemble Method, SENet, Attention Mechanism

PDF Full Text Request

Related items

1	Research On Generation Method Of Poems And Song Ci Combining Attention Mechanism And Conditional Variational Encoder
2	Research On Movie Recommendation Method Based On Multi-head Attention Mechanism
3	Undergo The Wind And Rain,Change In Nirvana
4	An analysis of the three modern Chinese orchestras in the context of cultural interaction across Greater China
5	Effect Of Variance And Attention On Ensemble Perception Of Different Stimuli Levels
6	Task Relevance Modulates The Necessity Of Visual Attention To Ensemble Perception
7	Chinese-Thai Bilingual Neural Machine Translation Method Based On Tree-structured Attention Mechanis
8	A Memetics Study On The Generative Mechanism And Evolution Of Network Violent Language Variants
9	Perceiving the world through group-colored glasses: Effects of self-categorization and group identification on attention and information processing
10	Language maintenance and language shift among Chinese American young adults in the greater New York City area