| In the era of financial technology,data has become an asset and a core competitiveness of enterprises.The technology represented by big data is already affecting the development of the financial field.Credit,as an important branch of the financial field,has complicated business processes,and the amount of data to be processed every day is as high as PB level.In addition,the data in actual production in the enterprise is relatively scattered,and there is no unified data processing process specification.The design and implementation of the Spark-based financial big data processing system in this paper proposes effective solutions to the pain points of enterprises to help enterprises maximize the value of data mining and utilization.The main work of this paper is as follows:(1)By comparing and analyzing the advantages and disadvantages of traditional data processing methods and the current mainstream big data processing framework,a set of complete data processing procedures is proposed.Use Flume tools to achieve multi-source data collection.After collection,use Hadoop Distributed File System(HDFS)for storage,and then use Spark SQL to perform offline calculations on Hive.Data that needs real-time analysis is transferred to Kafka.Spark Streaming pulls data for real-time analysis and calculation,and stores the results in HBase and MySQL,and provides a visual report analysis display interface.(2)Designed and implemented a Spark-based financial big data processing system based on the above scheme,including:data collection module,data storage module,data processing module,data display module and system monitoring module.The data acquisition module supports data acquisition of multiple data sources and multiple data types.The data storage module realizes the storage capacity of massive data,and can trace back historical data according to business needs.The data processing module has unified offline computing and real-time computing capabilities,and at the same time optimizes the performance of the system to a certain extent.The data display module implements the indicator board and self-service SQL query functions.The system monitoring module realizes real-time monitoring of the system and ensures the stability of the system. |