| With the development of artificial intelligence technology,communication interference technology with cognitive capabilities has become the current research focus.Cognitive communication jamming technology requires jammers to realize autonomous learning according to changes in the environment,so as to provide optimal jamming strategies.In the complex and changeable battlefield environment,the use of various antijamming technologies has greatly increased the difficulty of effective interference.Therefore,how to use the complex and changeable battlefield electromagnetic information to intelligently make interference waveform decisions has become an urgent problem to be solved.In response to the above problems,based on the theory of reinforcement learning,thesis focuses on the problem of intelligent interference waveform decision-making under non-cooperative conditions.Specifically,the main innovations are as follows:First of all,to solve the problems of multiple interaction times and long learning time in the optimal interference waveform generation algorithm for static targets,a method of setting a fixed or dynamic penalty threshold and marking multiple approximate interference waveforms is proposed.This method combines the multi-armed bandits machine model and the negative reinforcement learning,perform joint punishment on the interference waveform group with poor interference effect,it improves the probability that the waveform with good interference effect is selected,and effectively reduces the number of random explorations of the jammer.The simulation experiment results show that,compared with the algorithm without penalty threshold,the method proposed in thesis can increase the learning speed by about 3 times while greatly improving the interference efficiency.Secondly,aiming at the problem that the state of the communication changes due to the adoption of various anti-interference technologies in the actual communication countermeasures,a multi-state adaptive generation method of the optimal interference strategy is proposed.This method expands the state of the interfering target from static to dynamic,based on the Markov decision model,Q-Learning and SARSA are used to achieve continuous tracking and precise interference.The simulation experiment results show that compared with the multi-arm bandits machine algorithm under multi-state,the algorithm can obtain higher interference efficiency.Finally,in order to solve the problem of difficult to obtain interference feedback under non-cooperative conditions in the complex and changeable battlefield electromagnetic environment,three methods for obtaining interference feedback are proposed: 0-1 feedback,channel utilization feedback and data packet change rate feedback.This method expands the reconnaissance target from a single interfered node to the entire communication network,and combines the energy detection algorithm to evaluate the interference effect of the interference waveform from the perspective of the communication state,channel and packets volume changes.The simulation experiment results show that using these interference feedback under the non-cooperative conditions as the reward function to guide the jammer to learn the optimal interference waveform can achieve the same effect as the optimal interference waveform learning under the cooperative condition with the packet loss rate as the reward function.Based on the reinforcement learning algorithm,thesis studies the problem of intelligent interference waveform decision-making under non-cooperative conditions and makes full use of its advantage of not requiring prior information.In the complex and changeable electromagnetic environment of the battlefield,the jammer with the capabilities of accurate,rapid,dynamic adaptive,and real-time intelligence interference has very important and practical application value.Based on the theory of reinforcement learning,thesis studies the intelligent interference waveform decision-making problem in battlefield communication interference,and makes full use of its advantage of not requiring prior information.In the complex and changeable electromagnetic environment of the battlefield,the jammer with the capabilities of accurate,rapid,dynamic adaptive,and real-time intelligence interference has very important and practical application value. |