| In recent decades,the research interest in truth discovery technology in data mining has been increasing.Many kinds of novel algorithms have been emerging.These truth discovery algorithms can be divided into many sorts from different perspectives,including single truth value discovery algorithms and multiple truth value discovery algorithms.Different data sources observe from variable perspectives,which causes variable output data.The observation data output by the data source with high reliability is of high reliability.Thus,finding the truth based on variable data sources will improve the accuracy of the estimated truth value.In the environment of multiple data sources,these algorithms aim to find the truth in the data provided by different sources.The single truth value discovery algorithm finds a single truth value from the data provided by the data source.In contrast,the multiple truth value discovery algorithm finds a set of truth values from the data and cooperates with the final truth value.Both single-truth discovery algorithms and multi-truth truth discovery algorithms can not be separated from the reliability calculation of data sources.This fully reflects the importance of the reliability estimation of data sources in the truth discovery process.Moreover,it can be seen from most literature that the final definition of the reliability of the truth in the process of truth discovery needs to be clarified.Most of them use labels or scoring to show the performance of the truth value algorithm.However,the truth provided by truth discovery algorithms is reliable or not,and if so,how much? Based on observations from different data sources and the estimated truth value,how can we determine which source is reliable and how much? For the above two problems,there are few studies.In the context of multiple data sources,this paper mainly studies how to estimate the truth and its reliability and each data source’s reliability based on the truth.This paper proposes a mean-shift truth discovery algorithm RWMSC based on data source reliability weighting for truth discovery and data source reliability estimation.This method iteratively updates the truth estimation and the reliability of data sources until convergence.Mean-shift and anomaly detection methods calculate data sources’ truth and reliability.In addition to truth estimation methods,this paper proposes a truth reliability estimation method based on the EM algorithm.This method first discretizes the observation data into different events,then estimates critical parameters based on the EM algorithm,and finally calculates the probability of occurrence of events belonging to the truth given observed values through the Bayesian formula and approximates this posterior probability as the reliability of truth.To verify the performance of our proposed methods,we compare our proposed methods with some other mainstream truth estimation methods through experiments.Simulation experiments on the artificial dataset and public dataset show that our proposed methods are superior to other methods in terms of accuracy in estimating truths and their reliability. |