Font Size: a A A

Research On Application Of Improved K-Nearest Neighbor Algorithm In Message Prediction

Posted on:2023-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:D X CaoFull Text:PDF
GTID:2532306845989059Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The communication network in the aircraft carrier system is relatively complicated,and a large number of datagrams are transmitted in the communication network.The datagram carrier generally adopts a custom application layer protocol.The way to obtain the data entity in the datagram is to first identify the protocol type of the datagram,and then parse it according to the corresponding protocol format.Due to flexibility and other reasons,the application layer protocol design method in the aircraft carrier system leads to that datagrams can only poll matching local protocols for protocol identification.Because this trial and error matching process is very time-consuming,a datagram The analysis may require trial-and-error matching of hundreds of protocols,which severely restricts the performance of the system in a high-concurrency communication network.After investigation,it is found that the sequence of datagrams in this type of communication network is regular according to time,so this article establishes a model that can be embedded in the system to learn the rules to predict the type of protocol for unknown datagrams.By reducing the number of trial and error matching,to achieve the purpose of speeding up the analysis.Currently used in forecasting scenarios,there are three main types of methods,including time series forecasting methods,traditional machine learning methods,and deep learning forecasting methods.Because the communication network of the aircraft carrier system has the characteristics of variability of scenes,high-speed real-time,and few characteristics of the datagram itself,it also requires the algorithm model to be systematically embeddable.Methods such as deep learning prediction require a large amount of sample data for training,and the trained model cannot be applied to another communication scenario,and is not suitable for changeable scenarios.The traditional K-nearest neighbor is a lazy learning method that does not require training and can be quickly put into any changeable scene,but its prediction result is single,and the efficiency of neighbor search is low,and it is not suitable for high-concurrency scenarios.In order to solve the above problems,this paper proposes a model Speed Up Prefix Matching K Nearest Neighbors,SUPMKNN,which not only solves the problem of slow neighbor search in the K nearest neighbor algorithm,but also makes the output of K nearest neighbors diverse and reliable sex.The main results of this paper include:Speed Up Prefix Matching K Nearest Neighbors is proposed.This model is suitable for data sets with few characteristics of the data,but the data sequence presents strong regularity.The model changes the traditional way of calculating the distance and then selecting the nearest neighbors.Instead,it deeply extracts and caches the rules of the sequence through space for time.It directly hits K neighbors and extracts multiple of the K neighbors in the sequence rule.feature.Finally,multiple linear regression and logistic regression are used to calculate the similarity of each neighbor.Make the prediction results reliable and time-sensitive.A complete communication network simulation system is built,and the algorithm is applied and verified.The simulation system includes a datagram real-time analysis system,an XML protocol random generation system,and a datagram random sending system.The datagram real-time analysis system is similar to the packet capture tool Wireshark,which captures custom datagrams,and performs protocol matching and analysis storage of the datagrams to detect the effect of the algorithm.The XML protocol random generation system and the datagram random transmission system are used to generate random protocols and datagrams in batches to ensure the diversity and authenticity of the simulation data.The datagrams generated by the simulation environment built by these two systems are used by the data The report is received by the real-time analysis system to form the data set of this experiment.Finally,comparing SUPMKNN with commonly used polling algorithms and frequency algorithms,the efficiency of datagram analysis is significantly improved.And also used graph neural network to do a comparative experiment,the experimental results prove that SUPMKNN is no less accurate than the trained graph neural network model.Finally,SUPMKNN is applied to the system,which greatly improves the throughput of the system.
Keywords/Search Tags:K nearest neighbors, Datagram protocol matching, Linear regression, Logistic regression
PDF Full Text Request
Related items