| In this thesis we focus on the problem of combining multiple clusterings withoutaccess to the original features of the data. This process is known in the literature asensemble clustering or consensus clustering. It has been widely used and shownpromising results in many applications. Particularly, we are pretty interested in theprobabilistic model-based approach for solving this problem, and evaluate clustermembership in this paradigm.Firstly,we presents a Iterative Voting Consensus (IVC) algorithm which isthought to be a particular limit of the mixture of Gaussians models. We show that itperforms better after being initialized with a IVC++seeding technology.Then, from the frequent point of view, we try to model the ensemble ofclusterings using the mixture of Generalized Bernoulli distribution, and estimate themodel parameters by EM algorithm using the maximum likelihood criterion. Wedemonstrate that the IVC++seeding technology can also improve the performance ofthe EM algorithm.Finally, we propose a genetic based EM algorithm (GA-EM) for learning themixture of Generalized Bernoulli models. This algorithm is able to select the numberof components of the mixture model using the MDL criterion automatically. Thisapproach benefits from both the GA algorithm and the EM algorithm by combiningthem into a single procedure and explores the search space more thoroughly than thetraditional EM algorithm. |