| The combination of Internet technology and financial functions has formed the Internet finance business model,the core of which is to use Internet technology to improve the efficiency of financial business processing.Compared with other Internet industries,Internet finance has both Internet and financial risks,which include not only Internet industry risks such as computer viruses and network hackers,but also traditional financial industry risks such as money laundering and financial fraud.Therefore,the Internet financial industry has higher requirements for system security,availability and stability of the system.As a daily record of system operation,the log can help the business system meet these requirements,which plays a very important role to keep system stable.On the one hand,with the robust development of distributed and micro-services,the traditional log mode is not only inefficient in retrieval,but also difficult to ensure the timeliness of writing and the integrity of data.It is difficult to match the existing service scale.This problem becomes particularly acute as the business grows rapidly.Meanwhile,the logs contain rich content,but the lack of analytics tools makes it difficult for developers and operators to extract valuable information from them.Although machine learning and deep learning have been introduced to the log analysis,they are mainly focused on research and have many difficulties in actual use.For example,in the path exception analysis,the existing log analysis work often focuses only on the relationship between the log keys,ignoring the relationship between the log keys and parameters,and performs poorly in the distributed system.Therefore,aiming at the collection and analysis of distributed logs,this thesis designs a lightweight and efficient log collector FC-log to achieve fast and stable collection and transmission of logs,and proposes a log exception analysis model TL-MC based on components and log keys,which combines the component parameters and log keys to predict and improve the detection rate of log path exceptions.The main contributions of this thesis are as follows:1.Design and implement a high-performance log collector FC-log.To address the integrity and real-time issues of log collection,a double cache queue is proposed to solve the problem of data loss that may be caused by service exceptions,while improving data transfer efficiency.Using local disks as a degradation alternative,the data is synchronized back to the Kafka cluster after the system has recovered from the exception to ensure data integrity.Optimizations such as asynchronous delivery and customized logging are used to improve the availability of the system.2.A log exception analysis model TL-MC based on component and log key is proposed.In the log parsing stage,a log parsing model TL-parser optimized based on LCS algorithm is proposed which meets the requirement of online log parsing.In the exception analysis phase,a log analysis model MC-LSTM based on LSTM is proposed,which uses the correlation between component parameters and log keys for path anomaly detection,solves the problem that traditional analysis algorithms ignore the correlation between features,and supports the multi-thread detection requirements of distributed systems.3.Through comparative experiments,it is proved that the FC-log collector has better performance than logback but need low resource consumption,and the integration into the business system does not affect the normal operation of the system.The TL Parser log parsing model reduces the time complexity from O(mn)to close to O(n)and improves the parsing speed compared with Drain and Spell while ensuring the number of extracted templates.Compared with Deeplog and Log_ST,the MC-LSTM log analysis model increased the recall by 4%and the F1_socre by 1%under the same precision. |