| The radar is capable of dominating the success of the modern electronic warfare.To jam the radar,ECM(Electronic countermeasures)techniques are developed,which poses a serious threat to the survival of the radar.According to the direction-of-arrival of the signal,the jamming signals can be classified into mainlobe jamming and sidelobe jamming.Compared with sidelobe jamming,the mainlobe jamming can obtain the same antenna gain with the target returns and can not be distinguished from the target signal based on the feature of DOA.In addition,with the development of electronic warfare,the jammer is becoming smarter than before.In the future,the jammer will be capable of learning and adapting,which puts forward higher requirements for the antijamming research.In order to address the challenge of mainlobe jamming and smart jammers,we apply deep reinforcement learning and game theory,and investigate the antijamming strategies design in the frequency domain from the perspective of active and dynamic antagonism.The summary of this paper is presented as follows.1.Aiming at the problem that includes a frequency agile radar and a mainlobe suppression jammer,a deep reinforcement learning based antijamming strategies learning method is proposed.In the electromagnetic game,the jammer usually adopts some strategies and takes actions based on the intercepted information of the radar.It is difficult to find the antijamming strategies based on a specific mathematical model because the jamming strategies are complex.Therefore,Markov decision process(MDP)is used to describe the relationship between the radar and the jammer,and the states,actions and rewards are designed.Proximal policy optimization(PPO)is used to solve the MDP problem mentioned above.It can be seen from the simulation results that the radar is capable of learning efficient antijamming strategies against different jamming strategies based on the interaction with the jammer.In addition,to deal with the situation that the jammer may adopt multiple different jamming strategies,a policy distillation based unified antijamming strategies design method is also proposed.Based on this method,multiple antijamming strategies that are designed to combat their corresponding jamming strategies can be transferred into one single deep neural network,which enables the radar to combat multiple jamming strategies at the same time.2.Aiming at the problem that uncertainties exist when the radar or the jammer tries to sense or intercept the actions of its opponent,an imitation learning and WR2L(Wasserstein robust reinforcement learning)based robust antijamming strategies learning method is proposed.In the interaction process between the radar and the jammer,the radar needs to sense the frequency spectrum to infer the action of the jammer,and the jammer also needs to intercept the information of the radar and measure its parameters.Uncertainties exist in that procedure.If the uncertainties are not considered when the antijamming strategies are trained in the training environment,directly applying them in the real electronic warfare will result in a mismatch between the training and the test environments.As a result,the antijamming performance will degrade.Given a jamming strategy,it can be expressed by a series of parameters based on imitation learning,which can be regarded as the reference dynamic parameters.Then the dynamic parameters can be obtained by perturbing the reference dynamic parameters.Finally,based on WR2L,the robust antijamming strategy can be obtained by solving the maxmin problem whose variables are the parameters of the radar policy and the dynamic parameters.This method can improve the robustness of the radar antijamming strategies and alleviate the influence of the errors caused by the sensing or interception operation.3.To deal with the competition between the radar and a smart jammer,a game theory-based radar antijamming strategies design method is proposed.With the development of cognitive electronic warfare,the jammer is becoming smarter than before.To cope with this issue,game theory is used to model the relationship between the radar and the jammer.The radar and the jammer are the players in the game,and the waveform of the radar in the frequency domain and the power spectrum density are their actions,respectively.The utility in this game is the mutual information between the received signal and the target random impulse response.Stackelberg game is a special perfect information extensive form game,which means the radar and the jammer are able to obtain the information of its opponent perfectly and they take actions in a different order.With respect to Stackelberg game,the Stackelberg Equilibrium(SE)strategies are derived when the radar and the jammer are the leader respectively.Based on that,the existence condition of Nash Equilibrium(NE)is also investigated in the egalitarian game and the meaning of SE strategies are pointed out.4.To address the problem of multiple round interaction and imperfect information in the competition between the frequency agile radar and the jammer,a Neural Fictitious Self-Play(NFSP)based antijamming strategies learning method is proposed.In electromagnetic game,there are always multiple interactions between the radar and the jammer.In addition,due to the space limitation of the enemy aircraft,the jammer works in a transmit/receive time sharing mode to achieve good isolation.Therefore,the jammer can not intercept all the information of the radar,which incurs imperfect information.To handle this problem,imperfect information extensive form game is used to model the interaction between the radar and the jammer.A game tree can be used to describe the details of the game between the radar and the jammer.Based on imperfect information extensive form game,NFSP is used to solve the strategies of the radar and the jammer.Using exploitability as the evaluation metric in the simulation results,it can be seen that the strategies of the radar and the jammer converge to the approximation NE. |