| The open sharing of government data has become an inevitable trend of The Times.In the process of the open sharing of government data,the quality of data is one of the key factors affecting the value of data.How to conduct objective and accurate quality assessment and improvement of government data has become a current research hotspot.So far,many scholars have achieved fruitful results in both data quality evaluation and improvement.However,there are still some problems with the existing models or algorithms,for example,the AHP method is too subjective when calculating weights,but the entropy weight coefficient method is completely dependent on objective data;Traditional vector space models require a large number of feature items to fully match the entire text;The interrelationships among the indicators are also discussed.Based on the consideration of these problems,this paper mainly focuses on the two topics of government data quality assessment and improvement,and improves the deficiencies of these models or algorithms in the research process.The main contributions of this article are as follows:(1)In terms of quality evaluation of government affairs data,this paper first uses the accelerated genetic algorithm to improve the analytic hierarchy process to ensure that the discriminant matrix is the optimal solution when it is tested at one time.This method can also be modified for all elements;Secondly,the weights calculated by the entropy weight coefficient method were modified by using the conflict coefficient between the indicators;the two were combined again to form a subjective and objective combined weight;finally,based on the combined weight,a multi-level fuzzy comprehensive evaluation method was used Quality assessment of government data.The experimental results show that the weights calculated by this combination method accurately modify the weight ratio of each index,and the final quality evaluation results are more objective and true.(2)In terms of improving the quality of government affairs data,this article focuses on the three major quality issues of government affairs data: duplicate data,missing data,and abnormal data to improve the data quality.When improving the problem of repeated data in the quality of government data,this article uses the concept of text segments to improve the vector space model,which reflects the significance of the feature terms in different positions,and then calculates the similarity of each text segment vector to identify the approximate Or repeat the data to solve the quality problem of repeated data.For missing data,this paper uses the Markov distance instead of the traditional Euclidean distance to calculate the nearest neighbor when using the k-nearest neighbor algorithm.This method not only considers the value of each record Time difference,and also consider the relationship between each record,more accurately estimate the missing value;follow the classic Pauta criterion when processing abnormal data,and use the Bezier formula in it Transformed to improve the efficiency of loop processing multiple outliers.Finally,through experimental comparison and analysis,it proves that the above three methods are superior to the traditional methods. |