| With the increasing user demand for application services,the traditional monolithic architecture can no longer withstand the system load pressure of high access volume.Therefore,the microservice architecture is widely used in various industries and has become the mainstream architecture of high-concurrency systems.Although the microservice architecture brings many advantages,it also brings difficulties in operation and maintenance.In order to improve the efficiency of operation and maintenance and ensure the reliable operation of business systems,both academia and industry have carried out related practices and research.The microservice system is large in scale,complex in structure,continuously updated,and a large number of service requests are executed concurrently.Therefore,how to effectively diagnose faults between microservices and accurately locate faulty microservices has become one of the key issues to ensure the reliability and performance of microservice systems.First of all,this paper uses a time series-based abnormal service prediction algorithm to predict the operation and maintenance data requested between microservices during the operation of the microservice cluster,to identify abnormal services.LSTM neural network is a deep learning model suitable for processing sequence data,and it performs well in time series forecasting.To improve the efficiency of the algorithm,the scheme improves the early stopping method.Specifically,the early stopping mechanism checks the performance of the model on the validation set at the end of each training cycle and stops training if the performance begins to decline,otherwise,it continues training until a specific training cycle or reaches a certain performance threshold.By adopting the early stopping mechanism,the efficiency of the algorithm can be effectively improved,the risk of overfitting reduced,and the optimal model parameter combination found faster.At the same time,this solution can also detect and solve problems in the microservice cluster promptly by monitoring the business indicator data of abnormal sub-services and improving the reliability and stability of the business cluster.Secondly,by starting from suspicious microservices and using each microservice as a node,the source and destination of each request generate edges to form an abnormal call graph for a certain fault period,so that the calls between microservices can be understood more clearly regarding the delivery path for relational and exception requests.After the abnormal call graph is formed,the performance index and business index of each microservice in the graph is scored by setting a machine learning-based scoring model to evaluate the performance and business index performance of each microservice when processing requests.Specifically,the microservice as the root cause of the abnormal request is identified.Finally,an online experiment was conducted based on an Internet application to prove the effectiveness of the time-series-based abnormal service prediction algorithm and abnormal root service location algorithm.Through the test,the system can not only monitor and analyze the operation and maintenance data in the microservice cluster but also display the monitoring data and the call path of the abnormal request through the visual interface,which is convenient for the operation and maintenance personnel to locate the root cause of the abnormality and solve the problem,to improve the microservice,the reliability,and stability of the service cluster. |