| In the real world,there are lots of unlabelled data,such as,various medical images,web data,video data,etc.In the era of big data,this situation is more prominent.It is expensive to label large amount of unlabelled.Active learning is an effective method for solving this problem,and is one of the hot research topics in machine learning and data mining.In the framework of classification,based on online sequential extreme learning machine,this thesis studies the active learning problem.This thesis includes two main contributions:(1)This thesis studies the impact of random weight distributions on the performance of extreme learning machine(ELM),and obtains the following two conclusions:(a1)For different problems or different data sets,it is not always optimal to select the random weights with uniform distribution in [-1,1].(a2)There is no essential difference in ELM to initialize the the input weights and hidden nodes biases with uniform distribution or Gaussian distribution.(2)Based on online sequential extreme learning machine,an active learning algorithm is proposed in this thesis.The proposed algorithm has three advantages:(b1)Due to the nature of incremental learning embedded in online sequential extreme learning machine,the proposed algorithm can significantly improve the efficiency of learning system.(b2)the proposed algorithm use instance entropy as heuristic to measure the importance of the unlabeled instances,this measure can well model the information distributed by the sample to classification.(b3)K-nearest neighbor classifier is used as oracle to label the selected instances,the oracle is independent of the classifier which evaluate the importance of the instances without class labels.The experimental results show that the proposed algorithm has fast learning speed with exact labeling. |