Font Size: a A A

Research And Implementation Of Distributed Computing System For Big Data Credit Investigation

Posted on:2023-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:X X GuoFull Text:PDF
GTID:2568306914982369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the big data credit investigation scenario,the data has the characteristics of ultra-high dimensionality,mass,wide sources,and complex structure.But it is a problem to extract the huge value in credit reporting data.This is also a common concern of credit practitioners.With the vigorous development of big data technology,various computing engines have emerged.To solve different business scenarios,the engines is divided into batch processing and stream processing.It is very beneficial to broadening the breadth and breadth of the credit reporting market by using data innovation credit reporting service.The development trend of big data credit reporting is an objective inevitability.However,in the big data credit reporting scenario,the task has the characteristics of a mixture of real-time and offline.After preliminary technical research and analysis.Single distributed computing engine cannot meet the growing demand,and there are problems in task development and resource management.Therefore,the following challenges are presented to the distributed computing system for big data credit reporting:(1)The development cost of complex tasks is high,the system maintenance is difficult,and the code reusability is low;(2)The distributed resource management is not flexible enough,the scheduling strategy is fixed,and the task nature and cluster load are not fully considered;(3)There is a lack of a unified and visualized distributed computing system,the big data credit analysis process is not convenient and flexible,and the production efficiency of data value is low.To solve the above problems,the paper focuses on the exploration and analysis of task development and resource management.Meanwhile the distributed computing system for big data credit reporting is realized.The main research contents include the following three points:(1)Design and implement of batch flow unified visual business flow development tool:Based on Apache Beam for batch flow unified programming,complex tasks can be connected through tasks to form complex business flows.After the construction is completed,an appropriate distributed computing engine can be selected for business flow execution.The tool enhances the reusability of project code,reduces the development cost of complex tasks,and facilitates the development of big data credit services;(2)Explore and implement dynamic scheduling strategies based on mixed tasks:Based on Hadoop YARN as a unified manager of big data resources,the nature of the task is fully considered,the offline and real-time dual queues are divided to improve the fair scheduling of the two types of tasks;fully consider the queue resource pressure,and make dynamic policy adjustments according to load changes,so as to moderately reduce task execution time under different circumstances;(3)Design and implement a distributed computing system for big data credit reporting:build a unified web-based visual development environment for big data credit reporting developers,and provide data management,task management,and resource management.great ability.The research of this system relies on the national key research and development plan "Big Data Credit Investigation and Intelligent Evaluation Technology",which provides a distributed computing system integrating data management,task management,and resource management in the context of big data credit investigation.At the same time,the effectiveness and practicability of the system are verified.
Keywords/Search Tags:big data credit investigation, distributed computing, batch stream unification, mixed task, resource management
PDF Full Text Request
Related items