Font Size: a A A

Research On Middleware For Extending Analytics Capability Of PostgreSQL Based On Spark

Posted on:2018-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:M Y CaiFull Text:PDF
GTID:2359330566455726Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technology,the amount of data is growing rapidly in the fields of scientific research,Internet applications and many other fields.These data contains a large number of decision-making information,which many enterprises are in favor of.In order to get the useful information from the data,industry and academia have proposed more and more data analysis algorithms.Thus,the traditional relational database cannot meet the need of complex data analysis and the new big data analysis platform came into being.However,the traditional relational database is more suitable from the existing structure of the enterprise,and its ease of use,maintainability,and other features make it easy to maintain the existing system.So compared to a new platform,the relational database with query expansion is more suitable for the enterprises.But the relational database is relatively inefficient when dealing with big data.The problem to be solved in this paper is how to improve the data analysis ability of relational database.The existing solutions include the establishment of MPP database clusters and the establishment of SQL query engines on big data analysis platforms,but they also have their limitations.We propose a new solution to build the intermediate middleware to connect the database and a big data analysis platforms as a heterogeneous analysis system,allowing the relational database to execute the large-scale data analysis on the big data platform.Users can use the simple SQL query to execute the complex analysis with this system.The design of middleware is composed by four modules,including the communication protocol,the design of interface,data transmission and data processing.Based on the design,the system is loosely coupled,and the database and the big data analysis platform can be used to analyze the data independently or be used together by middleware.For this feature,data storage and computing models are physically isolated,and the system can be extended to other big data analysis platform with high scalability.When implementing the system,we choose the opensource relational database Postgre SQL and the distributed computing framework Spark as the basic components of the system.And in the next work,we program the middleware based on the design,and the interfaces of Postgre SQL and Spark.Finally,we prove the availability and efficiency of the system when executing complex data analysis.Then,we use this system to recommend information for the users of the bus WIFI,which further validates the efficiency and practicability of the system.
Keywords/Search Tags:data analysis, relational database system, big data analysis system, middleware
PDF Full Text Request
Related items