| Economy and society in China have developed rapidly in the last few years,contract management have become a key management element in the management of various industries as the basis for the management of various government departments,and has also become the main support for not only scientific decision-making but early warning alerts.The rapid growth of highly integrated global economy and computational science has propelled the society at large into the era with big data of information technology,and how to leverage big data technology to better serve the people is one of the key contents of the current extensive research.As one of the important contents of university statistics,contract statistics still face the problems of low quality,low accuracy and serious distortion of data quality,which affect the efficacy of the data,especially the full play of the university’s decision-making support function,so how to effectively assess the quality of contract data has profound theoretical research and practical application value.In this context,this thesis investigates the evaluation methods of contract data quality.Based on literature studies,this thesis analyzes the means and problems of contract management,collates theories and applications of anomaly detection,and describes the methods and characteristics of data quality assessment.Secondly,four key categories of contracts in the contract statistics drama of a college,namely purchase and sales contracts,repair and construction contracts,contracts for horizontal research projects and contracts for non-horizontal research projects,are selected,and the quality of contract statistics of the college is assessed by combining machine learning techniques and manual experience.Firstly,based on the College’s simple descriptive statistics of the four types of key contracts,a preliminary judgement was made on the distribution of anomalies.Then two suitable unsupervised learning algorithms,the Isolated Forests algorithm and the LOF algorithm,were used based on the characteristics of the contract statistics,and after detecting the anomalies,human experience was used to determine whether the anomalies were consistent with the pattern of local and overall changes behind them.Finally,the unsupervised KMeans algorithm was chosen to assess the quality of the contract statistics and to make a correct evaluation of the data quality by the percentage of anomalous data.The results demonstrate that both unsupervised anomaly detection algorithms are highly accurate in identifying anomalies in contract statistics;the Isolated Forest algorithm is significantly simpler than the other algorithms as it does not require configuration parameters,while the LOF algorithm is sharper in identifying anomalies in localised data.From the results,the outliers identified by the two unsupervised algorithms used in this thesis are more accurate in reflecting the social context or patterns behind them,and therefore the contract statistics of the university are tentatively considered to be of good quality.The university’s purchase and sales contract statistics are only available for the first half of the year 2021,and the data distribution is sparse,so it is not possible to judge whether the identified outliers for 2021 are impurity points;among the horizontal research project cooperation contract statistics,the value of the university’s contract in 2020 is on the high side,which is not in line with the social background at that time,and is therefore judged to be an impurity point.The anomalies generated in the rest of the period are not impurity points.Finally,the results of the quality assessment of contract statistics based on the KMeans algorithm show that the percentage of abnormal data in all four categories of key contracts is small,indicating that the vast majority of data fluctuations are within the normal range,i.e.the quality of the university’s contract statistics is good.The main contribution of this thesis is to propose a machine learning-based anomaly detection and quality assessment method to evaluate the quality of contract statistics in a university.The results of the experimental study show that the machine learning approach used in this thesis can effectively identify anomalies and make a correct assessment of the quality of the data. |