| With the rapid development of China's radio and television information technology and industry,radio and television program content and video services have become increasingly abundant.In order to avoid the problem of low efficiency for users to find and match personalized content in large-screen TV interaction,Guiguang Network intends to make full use of big data and artificial intelligence technology to create a radio and television exclusive intelligent recommendation system.The establishment of the intelligent recommendation system is based on huge data.Through a large amount of data analysis,user preferences are found.Personalized content recommendation is based on user preferences to improve user experience and enhance user viscosity.Traditional relational databases Mysql and Oracle have been unable to meet the needs of huge data storage and data analysis.With the emergence of NoSQL databases,it provides a solid guarantee for mass data storage and processing.Therefore,in view of the urgent needs for data storage and data analysis in the radio and television intelligent recommendation system,it is necessary to establish a set of storage systems dedicated to radio and television intelligent recommendation applications.The thesis mainly studies how to use distributed storage and distributed computing technology to build a set of proprietary recommended application storage system for Guiguang.Based on the recommended application architecture,historical and real-time data are stored and calculated in different ways.Combining HDFS with high fault tolerance,high scalability,and suitable for batch processing,it puts a large amount of historical data accumulated in a period of time into HDFS storage,and uses offline computing framework MapReduce for offline mode data processing;for streaming data,HBase is selected Storage,while adding Kafka message middleware as data forwarding,to make the whole system components more concise,and then use SparkStreaming stream processing framework to process real-time data and perform some data statistics.Regardless of whether it is distributed storage or distributed computing,this type of big data technology has good cluster expansion capabilities,and even if a small number of nodes fail,the impact on the entire system will be low,and it has good stability.In addition,these big data components can be deployed on ordinary PCs,which can effectively control the cost of deployed equipment.The thesis has elaborated on the design and implementation of the intelligent recommendation application storage system for radio and television.Based on the characteristics of the layered design and cluster services,it finally provides Guiguang with a support for high concurrency,scalability and low energy consumption Reliable distributed data storage and processing system.During the implementation of the project,targeted storage optimization was done,including:(1)HDFS small file storage;(2)HBase hotspot issues;(3)Multi-thread resource sharing issues in the cluster service state.Find solutions through in-depth analysis of these three problems,and demonstrate the feasibility of the technical solutions through experiments.Finally,the system was tested for function and performance to confirm the feasibility of the system. |