Research And Implementation Of Cache And Fault-Tolerance Optimization Strategy Based On Spark

Posted on:2021-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Zhao

Full Text:PDF

GTID:2518306308478554

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous increase of data volume and real-time demand of Internet users,the data processing framework based on distributed memory computing has become the preferred tool for data business and scientific research.Due to the increasing available memory of the computing system,in-memory data processing technology can use large memory space.How-ever,relatively high RAM prices can lead to limitations on memory re-sources.Therefore,an important issue is to predict when and what data will be extracted into the cache to reduce the computational wait time,and to effectively predict the access patterns of the data for cache management.This paper is based on the most widely used Spark memory computing framework.First of all,this paper designs and realizes a kind of resource of perception cache management methods,improve the existing method of using data depend on the application of semantic,using the theory of com-binatorial optimization and runtime cluster resource usage,selecting the optimal cache replacement strategy,so as to improve the efficiency of memory resources and to reduce the running time of the application.Sec-ondly,this paper proposes a fault-tolerant guarantee mechanism for quick recovery of lost data for Spark,establishes a comprehensive recovery cost model based on data unit characteristics,and improves the recovery effi-ciency of data tasks through checkpoint setting strategy.Finally,resource usage during application execution is dynamically captured through non-intrusive secondary development based on the Spark source code.At the same time,a pluggable cache management module is provided,the re-source aware cache management method proposed in this paper is con-nected to the current Spark deployment,and a storage error generator is developed to test the efficiency of the fault-tolerant mechanism proposed in this topic to restore the computing process.Through comparative exper-iments and analysis of the results,it is proved that the cache management and fault-tolerant optimization strategy proposed in this paper can improve the computing efficiency of the big data analysis framework.

Keywords/Search Tags:

memory computing, Spark, cache management, fault-tolerance

PDF Full Text Request

Related items

1	Research On Fault Tolerance And Improvement Strategy For Storage Layer Under In-memory Computing Environment
2	Research On Memory Management And Cache Replacement Policies In Spark
3	Research On Memory Management And Fault Tolerance Mechanisms Based On NVRAM
4	Research On Memory Data Management Technology In Spark
5	Research And Implementation Of Memory Optimization Based On Parallel Computing Engine Spark
6	Research On Adaption Method Of Cloud Fault Tolerance Services Based On User Requirement And Resource Constriction
7	Research On Spark Performance Optimization Technology For In-Memory Computing
8	Design And Implementation Of Fault Tolerance Technology For Distributed System
9	Research On Memory Optimization Technology Of Spark Computing Engine
10	Adaptive Memory Management Research Based On In-Memory Computing Characteristics In Spark