Font Size: a A A

Research On Random Forest With Application To Image Classification Based On Hadoop

Posted on:2018-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q N ZhangFull Text:PDF
GTID:2348330542479599Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As technology of the Internet and digital multimedia evolves rapidly,quantity of digital image increases continuously.As one important aspect of image processing and application,image classification is the base of image retrieval,object detection and information filtering.Automatic analysis and classification of image by means of computer draws more and more attention currently.Random forest,as one kind of ensemble classifiers,is employed in image classification and has a good performance.However,random forest is as well challenged by high time consuming,behindhand file system and lagged processing architecture when dealing with massive image classification task.Hadoop is one of the mainstream platforms with distributed computing ability,and universally noted when it comes to big data processing.It is beneficial to implement the process of massive image classification on Hadoop so as to achieve parallel running of algorithms,which will utilize the power of computing and storage of Hadoop and improve classification efficiency.Based on the above,the thesis studied random forest and its application in image classification,and designed parallel implementation of overall classification process.Firstly,the thesis summarized basic approaches of image classification,with special stress on Sift feature and BoVW model.Then an introduction to random forest was made,including its base classifier,work mechanism,the growing-up strategy and evaluation features of random forest.As to the special stage of decision trees generating a random forest,the thesis detailedly discussed principles and factors of ensemble classification,and proposed an approach where a confusion-matrix based similarity measurement was applied to remove bad trees and achieve model selection of random forest.Experiment of this part showed that random forest with model selection based on confusion matrix had better classification performance.Eventually,the thesis designed parallel realization of random forest classifier as well as Sift feature and the BoVW model.Experiment results proved that image classification with random forest based on Hadoop could reduce time cost of execution,improve the efficiency of algorithms and at the same time achieve good classification performance.At last,summaries and prospects of random forest algorithm with application to image classification based on Hadoop are discussed.
Keywords/Search Tags:Random Froest, Image Classification, Hadoop, Confusion Matrix, BoVW
PDF Full Text Request
Related items