Font Size: a A A

Theoretical Analysis And Efficient Algorithms Of Partitioning Problems For Big Data Computing

Posted on:2023-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:B L NingFull Text:PDF
GTID:1528306839481564Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the rapid development of big data,more and more share-nothing big data computing systems have appeared.For those systems,one of the core fundamental problems is data partitioning,which is also one important fundamental research problem in computer science.Briefly speaking,the problem of data partitioning is to partition big data into several small parts with the same size such that the data after partitioning satisfy some specified constraints or the cost of partitioning is minimized.In real systems,the size-balanced partitioning strategy is the basic of load balancing,and the optimization of computation and communication can be usually guaranteed by satisfying special constraints and minimizing the partitioning cost.There are two research challenges for big data partitioning,one is brought by the resource constraints of big data computing,and the other one is brought by the computational hardness of complex structured data partitioning.Therefore,it is of high research and practical values to study fundamental theories and efficient algorithms of data partitioning problems for big data computing.Focusing on relational,tree structured and graph structured data,the thesis studies the data partitioning problems for big data computing,aims to solve fundamental theoretical problems of data partitioning,and provides efficient partitioning algorithms for big data with different types and sizes.(1)First,the range partitioning problem on relational data is studied.In real applications,the scale of relational data is too large to be solved efficiently even by linear time algorithms,therefore,sublinear algorithms are highly needed.As far as we know,there still lacks of lower bound results of sublinear partitioning algorithms and sublinear partitioning algorithms in the I/O model.Therefore,both theoretical lower bounds and efficient partitioning algorithms with sublinear cost are studied.In the RAM model,by proving a better lower bound for any sublinear range partitioning algorithm,it is shown that current partitioning algorithms have achieved a near optimal performance.In the I/O model,by proving a linear lower bound for the external range partitioning problem,it is shown to be impossible to design sublinear algorithms.When the input satisfies partially independent constraints,an efficient external sublinear partitioning algorithm is designed.Compared with previous methods,experimental results are used to verify the performance of the proposed algorithm.(2)Then,the extended partitioning problems on tree structured data are studied.In real applications,the extended partitioning problems on tree structured data are important but ignored.Therefore,the intractability of extended tree partitioning problems is studied theoretically,and efficient partitioning algorithms with linear costs are designed.The formal definitions of the balanced extended partitioning problem(6)-VBTP for short)and the connected extended partitioning problem(VBTP for short)are introduced.Based on the techniques of dynamic programming and greedy selection,both polynomial time and linear time partitioning algorithms for VBTP are designed.By analyzing the computational complexity and inapproximability,it is shown that 6)-VBTP is highly hard to solve.By relaxing the balancing requirements,an approximation algorithm with nearly linear time cost for 6)-VBTP is designed.Experimental results are used to verify the time performance of the proposed algorithm.(3)Finally,the balanced partitioning problems on graph structured data are studied.The classical balanced graph partition problem is not proper for big data computing,thus,it is highly needed to study novel balanced graph partitioning problems and design partitioning algorithms with performance guarantee.Therefore,two variants of the balanced graph partition problems aiming to optimize specific workloads and motif computation are proposed,and theoretical analysis and efficient approximation algorithms are studied.To optimize specific workloads,a semidefinite programming based bi-criteria approximation algorithm is designed.For optimizing motif computation,the computational complexity and inapproximability of the corresponding partition problem are proved.Then,for the case that the motif is a triangle,a semidefinite programming based bi-criteria approximation algorithm is designed and extended to the general case.Compared with previous methods,experimental results are used to verify the performance of the proposed algorithm.
Keywords/Search Tags:data partitioning, computational complexity, approximation algorithm, sublinear algorithm, efficient algorithm
PDF Full Text Request
Related items