Font Size: a A A

Research On Dimensionality Reduction And Classification Method For Food Safety Data

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:X N ChenFull Text:PDF
GTID:2381330626953879Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the progress of society and the improvement of living standards,people have higher and higher requirements for food quality and safety.At the same time,the situation of food safety is very serious and has become a hot topic of concern.There are huge amounts of information in food safety data.How to use data analysis and mining techniques to solve the problems facing the food safety industry has become the focus of research today.However,since the development of the food safety field,the large-scale,multi-class,high-dimensional and other characteristics of the data source have seriously affected the processing efficiency of traditional technologies.On the one hand,the dimensional disaster of food safety data makes the classification technology unsatisfactory.On the other hand,although the traditional dimensionality reduction technology achieves the dimensionality reduction effect,it reduces the ability to classify low-dimensional data.In order to further improve the efficiency of data mining related to food safety.By studying the dimensionality reduction and classification technology of food safety data,the modeling theory,advantages and disadvantages of traditional dimensionality reduction and classification methods are analyzed.Based on the sources and characteristics of food safety data and the existing defects and problems of dimensionality reduction and classification technology,taking food-related data sets as the main research object,the dimensionality reduction and classification methods of food safety-related data are studied.The main work done in this article is summarized as follows:(1)A principal component analysis algorithm based on mutual information credibility is proposed.For the food safety data set,the traditional principal component analysis algorithm has the problems of too long time-consuming,general dimensionality reduction results,and cannot meet the actual classification requirements.By studying mutual information from different angles,the idea of mutual credibility is introduced.Firstly,the feature selection of the data matrix is performed by using the mutual information comprehensive credibility,and then the PCA algorithm is used to reduce the dimensionality.The algorithm improves the contribution of low-dimensional data to category judgment while ensuring the dimensionality reduction results of the high-dimensional food dataset.(2)A dimensionality reduction algorithm based on principal component analysis based on intra-class and inter-class distance is proposed.In order to improve the dimensionality reduction results of high-dimensional food safety data and improve the discriminative ability of low-dimensional representation of data,the intra-class and inter-class distances are introduced.By minimizing intra-class distance and maximizing inter-class distance,the data projection matrix is optimized,and the principal component analysis algorithm based on information entropy is improved.While ensuring the dimensionality reduction results of high-dimensional food data sets,the algorithm improves the contribution of low-dimensional data to category judgment.(3)An improved C4.5 algorithm based on sample selection and cosine similarity is proposed.For large-scale food data sets,in order to improve the classification accuracy of the data set and reduce sample training time,firstly use the statistical optimal sample size strategy to determine the optimal sample size,using the classification accuracy of the C4.5 algorithm as the iterative basis to highly optimize the optimal size sample set and determine the optimal training set.Then calculate the cosine similarity between attributes,merge the attribute values of highly similar attribute pairs in the training sample set,and update the training set.Finally,select the best splitting attributes according to the C4.5 algorithm,build a decision tree,and improve the algorithm's execution efficiency and classification accuracy on food-related data sets.
Keywords/Search Tags:Food safety, Big data, Data mining, Data dimensionality reduction, Classification
PDF Full Text Request
Related items