Font Size: a A A

Recognition Of The Quality Of Wine Based On Data Mining

Posted on:2011-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:C X LinFull Text:PDF
GTID:2191360305494746Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of the domestic wine industry, both the number and the scale of wine companies is increasing. However, China's wine industry is still facing the fierce competition against imported wines as well as the market disorder caused by the shortage of the quality assessment system. To solve these problems, we in this paper first discussed the deficiencies of wine artificial taste, and then proposed a methodology to improve the recognition rate of wine quality by means of data mining techniques. It may benefit the quality control of wines as well as the stable advancement of Chinese wine market.In the field of data mining, the analysis of unbalanced data is the common case. Compared to the class with more samples, the influence of minor class on the prediction accuracy is smaller. When all samples are classified with high accuracy, the samples of the minor class may be not recognized. And the classification rule of identifying the minor class will be ignored. The innovation of this paper lies in modeling the balanced samples extracted from the unbalanced ones, and then using the model to predict the test samples. By repeating the process for N, e.g. 1000 times, we make the final prediction by voting. The method improved the recognition rate of low-quality wine greatly.Based on such sampling, discriminant analysis, support vector machines, classification and regression trees and random forests used in the recognition of wine quality were compared in this paper firstly. Among these methods, random forests achieve the best performance in terms of higher overall recognition rate and rate of identification of low-quality wine. Moreover, the random forest model was shown to be stable. Secondly, the average importance of all variables can be otained by using random forests. The variable importance ranking told us that potassium sulfate and the alcohol are important factors influencing the quality of wines. This means that the increment of the potassium sulfate and/or the alcohol tends to result in a higher quality wine. The variable importance ranking also helps brew higher quality wine. Finally, the outlier detection method is applied to detect the samples of low-quality wine. Unfortunately, the identification of low-quality wine is poor, only 30% samples of low-quality wine are identified. So the outlier detection can only complement and improve the results of wine reference identification. But the result shows that the outlier detection improved the identification rate.
Keywords/Search Tags:Wine Quality Recognition, Discriminant Analysis, Support vector machine, Classification and Regression Trees, Random Forests, Outlier Detection
PDF Full Text Request
Related items