Font Size: a A A

Bayesian Estimation Of Bipartite Matchings For Chinese Record Linkage

Posted on:2018-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X T LuFull Text:PDF
GTID:2370330515453669Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Records Matching,the task of merging two or more databases in absence of a unique identifier,is a challenging problem.In this paper,we generate records containing Chinese name,age and occupation with their realistic characteristics,such as frequency distribution.Subsequently,we create duplicate records by modifying the original records and give them the same id referring back to their original records in order to allow the calculation of matching rate which can explore the performance of the model.Due to the differences between English records and Chinese records in computing the records distance,we define our own edit distance for data pre-processing.Then we argue that the assumption of independence in the matching statuses of record pairs is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest,choose the prior distribution of the parameter we interest to derive a variety of point estimators using different loss functions.Finally,we describe Gibbs sampling algorithm for simulating the posterior distribution and evaluate our approach to Chinese record linkage using a variety of scenarios.To focus on the performance with our model,we compute the measures of precision and recall and conclude that the implementation of our approach works well.
Keywords/Search Tags:Bayesian estimation, Records matching, Records generation, Gibbs Sampling algorithm, Error analysis
PDF Full Text Request
Related items