| With the continuous development of China’s economy and society,in order to adapt the economic conditions of the country and various districts,laws and policies in the field of social insurance(medical/maternity insurance,endowment insurance,unemployment insurance and work injury insurance)and housing fund have also continued.The inevitable result of this adjustment is that a large number of texts in domain laws and policies will be produced in various time segments and districts.This has produced the phenomenon of semantics shifts of domain terminology.It is meaningful to study how to automatically obtain this kind of shifts.The reason is that,firstly,the shifts are closely related to national life.If the semantics shifts of terminology in these fields can be automatically obtained,it can greatly facilitate people’ s lives.On the other hand,there are already a large number of laws and regulations in the four insurances and housing fund,and it can be expected that it will inevitably increase over time.Therefore,it is a natural demand to resort to computerized automatic acquisition.The goal of this thesis is to automatically extract the domain terminologies and their timedistrict-related semantics from unstructured laws and policy texts and organize them into(domain,terminology,time,district,selected-from)five-tuple forms,these five elements describe the semantics shifts of the domain terminology.This description can clearly indicate the semantics shifts of domain terminology over time and district.First extract domain terminologies from text of laws and policies,and then extract their time and regional related semantics in the text based on these domain terminologies,and form five-tuple together with the corresponding selected(from which law or policy).This thesis proposed a hybrid method based on rule matching(part-of-speech and part-of-speech combination rules),statistical filtering(multi-dimensional pointwise mutual information)and modified prototype network(word-level embedding)filtering to extract domain terminologies then based on these domain terminologies,a hybrid approach based on rules(exclusion rules and matching rules)and a modified prototype network(sentence-level embedding)in the four insurances and housing fund laws and policies to extract their time-district-related semantics.Finally,the results are organized into the form of five-tuple,and a method for describing the semantics shifts of domain terminologies using the five-tuple and the knowledge therein is provided.This method can clearly represent the semantics shifts of domain terms over the time and district.In the end of this thesis,we show how these quintuples used in auditing methods.To be more precise,we use the quintuples to help comptrollers to choose the auditing methods that they might want to use and modify. |