Font Size: a A A

The Design And Implementation Of Mlab On DataStudio System

Posted on:2018-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2348330515988649Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the information age,data is rising sharply,how to quickly deal with the data and analyse from data has become the pressing needs of the real world.More and more fields in social economy and science have involved in big data applications.The basic attributes of big data have become more and more including the number,speed,diversity and so on.At the same time,the value of the data is more urgent to be mined,besides,big data has been more and more widely used in data analysis.The analysis and technology for big data is widely used in the main current including Hadoop related components,Python analysis,Matlab analysis,Spark which computes in memory,etc.Hadoop and its components is powerful.Python analysis is efficiency in processing,and has high maintainability,but efficiency is not high when it runs.Matlab is powerful too,but it is not free.Spark provides offline and real-time processing and it calculates based on memory.besides,it has Mlib algorithm package to provide the ablity of data analysis.This paper mainly introduces algorithm laboratory of a large data storage and processing platform and we call it DataStudio platform,DataStudio platform which based on Hadoop has provided data storage,processing,migration,scheduling,and other functions.But the method of analysis of the data processing can only rely on MapReduce or sql and we need to code to complete it.To realize the goal of improving means of data analysis,we consider developping MLAB based on the background.MLAB provides a web visualization of algorithm for the machine learning users,and users only need to select the algorithm model and the model parameter.As Spark has open source and perfect integration and it is based on memory computing,so we choose the Spark Mlib algorithm package for laboratory to support our algorithm.Algorithm laboratory uses the J2EE architecture,the MVC design ideas,together with oozie,quartz,hibernate.Eventually we made MLAB.With the complete of MLAB,it improves the analysis method of DataStudio platform for means of data processing,and speed up the handling of data analysis,besides,improve the competitiveness of the platform and make it becomes more useful.Besides,it improves the efficiency of the process for user on data analysis.User does not need to code,and platform is strong,it also has reusability.
Keywords/Search Tags:MLAB, Spark, oozie, quartz
PDF Full Text Request
Related items