| With the rapid development of information society,the public’s awareness of privacy protection is increasing,and some encrypted instant messaging tools are more and more favored by the public.Telegram is one of many encrypted instant messaging tools.Its communication protocol MTProto can ensure the safety and reliability of data transmission.It is precisely because of the strong protection of users’ privacy by Telegram that it has become a gray area of cyberspace supervision.Criminals use it to spread information related to pornography,explosion,terrorism and drugs,which has brought a lot of unstable factors to the society.Therefore,it is very necessary to monitor and analyze the Telegram.This paper has carried out the following three aspects of research work:(1)Design of key technologies of distributed data collection subsystem.Through consulting relevant materials and a large number of tests,this paper deeply investigates the relevant technologies of Telegram collection and some inherent limitations of Telegram itself and interface.In view of these limitations,three overall collection schemes are designed:active and passive message collection framework,data real-time assurance strategy based on event and verification,and group information pre collection scheme based on pooling idea,To ensure the real-time,collection rate and continuity of data collection.(2)Implementation of distributed data collection subsystem.A distributed data collection system is developed and implemented based on three overall collection schemes.In order to ensure scalable and configurable flexible data collection,the system is realized as a task driven collection system,which can flexibly schedule and allocate collection tasks in combination with the monitoring and analysis system.(3)Monitoring and analysis system.Based on the data collected by the distributed data collection system,a safe,real-time,fully functional,highly available and scalable monitoring and analysis system is completed by selecting the development mode of front and rear end separation,including six modules:data search,personal center,attention,data monitoring,collection management and system management.It realizes various retrieval methods and statistics for messages,groups,person and files,continuous tracking,monitoring and report push for specific groups,people and keywords,comprehensive configuration management,task scheduling and node health monitoring alarm of the collection system.In order to shorten the system response time and improve the user experience,a lot of cache optimization has been done.Finally,this paper makes a comprehensive test on the distributed data collection subsystem and the upper monitoring and analysis system.The test results verify the real-time,collection rate,sustainability and flexibility of the collection system,and the functional integrity,security,real-time and use experience of the monitoring and analysis system. |