Font Size: a A A

Research On Segmentation And Part-of-Speech In Four Social Insurances And One Housing Fund

Posted on:2021-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2507306047482274Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
"Four insurances and one fund" refers to the collective name of several kinds of guarantee treatment given to employees by employers.The "Social Insurance Law of the People’s Republic of China" and other related laws,regulations,and government documents related to the four insurances and regulations provide relevant content.These laws and regulations are the basis for the relevant legal entities to enjoy rights and perform obligations,and relevant government documents.Is the code of conduct of the relevant subject.The use of computers to analyze,reason,and summarize social insurance laws and regulations is conducive to the inspection and improvement of relevant laws and regulations;it provides new channels for relevant entities to understand and understand relevant laws and regulations and government documents,and helps relevant entities to recognize 2.Use relevant laws and regulations.Laws and regulations and government documents are written in natural language,and the analysis and processing of related laws and regulations using natural language processing technology is the first and important part of this work.This paper researches the word segmentation and annotation in the text natural language processing of the four insurances and one gold field,and provides basic methods and data for subsequent work.This paper designs a segmentation and labeling scheme for the four insurances and one gold field.Among them,the word segmentation uses the unsupervised field word segmentation method applicable to the four insurances and one gold field.This research makes full use of the available information,analyzes the characteristics of the four insurances and one gold domain corpus,and sets up separate word segmentation schemes for the three different types of words that play a major role in the domain text.The results of the authors are combined,in which non-standard words use regular expressions to match word segmentation;domain compound words use dictionary-based word segmentation methods,where the dictionary uses an unsupervised vocabulary extraction algorithm based on the central word,and the central word uses a web crawler Technology acquisition;common words are segmented using open source open-domain word segmentation systems.In this paper,we design a tagging set for word categories in the four insurances and one gold field,and refine the non-standard words,domain compound words,prepositions,and conjunctions with a large amount of information.The tagging set is oriented to practical applications and is beneficial to subsequent steps.The word category labeling uses a method based on string matching.The method is based on word segmentation design and can complete the labeling work without a dictionary.
Keywords/Search Tags:four social insurances and one housing fund, Natual Language Processing, Chinese Word Segmetation, Part-of-Speech Tagger
PDF Full Text Request
Related items