| Clustering problems are classical problems in computer science and operations research,and have extremely broad practical applications.By designing approximation algorithms,we study the variants of the spherical k-means,facility location problem and correlation clustering problem.The main research contents include spherical k-means with penalties,fault-tolerant facility location problem with penalties,two-stage fault-tolerant facility location problem with penalties,uncertain correlation clustering problem and capacitated correlation clustering prob-lem.Based on the LP-rounding technique、initialization technique and greedy augmentation technique,we propose approximation algorithms as well as the corresponding theoretical anal-ysis for above variants.In the spherical k-means with penalties,we are given a data set which is distributed on a unit sphere,an integer k,the distance between two data,each date can be clustered or penal-ized.The goal of this problem is to partition the data set into penalized set and clustered set.Furthermore,we cluster the data in the clustered set into k clusters so as to minimize the within-cluster sum of distance and penalty cost.The proposed model effectively avoids the excessive influence of individual points on the whole.The crucial challenge of solving this problem is to select a set of center data such that we can analyze the algorithm from theoretical perspec-tive.In this thesis,we give a 2max{2,M}(1+M)(lnk+2)-approximation algorithm based on an initialization method on general instances.Furthermore,we prove that above algorithm can achieve a 2max{3,M+1} approximation ratio on separable instances with probability[(4 max{5,M+1})1-k]/2,where M is the ratio of the maximal penalty cost and the minimal one of the given data set.In the fault-tolerant facility location problem with penalties,we are given a facility set,a client set,the facility cost for each facility and the connection cost between each facility and each client,a connection requirement,a weight vector as well as the penalty vector of each client.Each requirement of a client is either connected to an opened facility or rejected by paying the penalty cost.The goal is to open some facilities,reject some requirements of clients,and connect the remain requirements of clients to some opened facilities such that the total cost including the facility cost,the weighted connection cost as well as the penalty costs is minimized.In this thesis,we design approximation algorithms based on LP-rounding technique,the crucial challenge of this technique is that the optimal fractional optimal solution can not response of the structure of the solution,thus it is difficult to identify the penalized clients.In order to overcome this challenge,we construct a feasible solution based on the fractional optimal solution,which does not increase the cost but can better reflect the solution.Then,we identify the penalized clients based on the constructed fractional feasible solution by a parameter.Finally,we obtain an integer feasible solution by selecting the opened facilities with certain skills.We approve that our algorithm is an 4-approximation algorithm and then we apply the randomized rounding technique to improve the approximation to 3.16,which is further improved to 2.408 by the greedy augmentation technique.In the two-stage stochastic fault-tolerant facility location problem,we are given a facility set and a client set,the facility cost of each facility in two stages as well as the connection cost between each facility and client,K scenarios together with the corresponding clients set as well as the probability distributions,a connection requirement of each client in each scenario as well as the weight vector.We open facilities in two stages,the facilities opened in the first stage can serve all the clients well the facilities opened in the second stage can only serve the client in the corresponding scenario.The goal of this problem is to open facilities in two stages and ensure the clients in each scenario are connected to a certain number of open facilities so as to minimize the facility cost in the first stage and the expectation of summation cost including the facility cost and the weighted connection cost generated by K scenarios.We design approximation algorithms based on LP-rounding technique,the crucial challenge of this problem is that the optimal fractional optimal solution can not response of the structure of the solution.In order to overcome this challenge,we construct a fractional feasible solution which can better reflect the structure of the solution based on the optimal fractional solution.In this thesis,we present a deterministic 5-approximation algorithm by above technique and further improve the ratio to 3.8617 by using a standard randomization rounding technique.In the uncertain correlation clustering problem,we are given a complete graph and the probability of each edge being a positive edge.The goal of this problem is to partition the vertices into some clusters so as to minimize the expected number of positive edges whose end-points lie in different clusters plus the expected number of negative edges whose endpoints lie in the same cluster.The proposed model effectively overcomes the limitation that the current algorithms can only deal with the problem of correlation clustering on certain graphs.We de-sign approximation algorithms based on LP-rounding technique,the crucial challenge of this problem is how to identify the set of points which should be clustered into a cluster based on the fractional optimal solution.In order to overcome this challenge,we introduce an undetermined parameter to give a clustering criterion,and obtain a 6-approximation algorithm by selecting proper parameter.In the capacitated correlation clustering problem,we are given a complete graph G-(V,E),where V is the vertex set and E is the edge set.Each edge is labeled by+or-de-pending on the similarity of the associated vertex.Each vertex v corresponds to an upper bound Uv of capacity.The goal of this problem is to partition the vertices into some clusters so as to minimize the number of positive edges whose endpoints lie in different clusters plus the number of negative edges whose endpoints lie in the same cluster.Furthermore,for each cluster C we have |C| ≤ minv∈CUv.Uv.In this thesis,we introduce a parameter α∈(0,1/2]and provide a(1/(1-α),2/α)-bicriteria approximation algorithm based on LP-rounding technique.Name-ly,the solution returned by the algorithm has the cost that is at most 2/α times the optimum,and for each cluster C in the solution,we have |C| ≤minv∈CUv/(1-α).Furthermore,we give a 2U-approximation algorithm by limiting the number of vertices in each cluster,where U=maxv∈VUv. |