| The development of active information collection technology can improve the autonomy and intelligence in the fields of tracking,monitoring,inspection,search and rescue,and environmental perception.Distributed filtering and distributed decision-making are two important subproblems in active information collection.This thesis considers the multi-agent target tracking problem of controlled targets,and focuses on the distributed filtering algorithm and distributed decision-making algorithm in this scenario.In information collection,the agent obtains the target information through the sensor.However,due to the sensor technology,the readings generally contain random noise.The filtering algorithm studies to estimate the real state of the target from the noisy data.However,in the tracking problem,the observation data can only be obtained when the target is close to the agent,and how the agent can efficiently utilize the neighbor information to achieve the accuracy and consistency of the estimation is also an important issue.On the other hand,in distributed decision-making,it is necessary to plan the actions of each agent at each moment to obtain the best group information collection efficiency.In the case of limited observation range,the agent needs to make decisions based on the observation history and action history.When the communication distance is limited,the agent needs to cooperate with the neighbors that change all the time to improve the overall performance.In view of the above difficulties,the main work of this paper is as follows:Distributed filtering: Existing distributed filtering algorithms can achieve satisfactory results when the observation is stable or the probability of obtaining the observation is Bernoulli distribution,and it is easy to diverge in the tracking problem.Through careful identification,the probability that the agent observes the target in the tracking problem is modeled as a Markov process.Then,based on the Kalman consensus filtering algorithm,a distributed filtering algorithm for Markov intermittent observation is proposed.However,the computational complexity and communication complexity of the algorithm are high.An approximation algorithm whose computational complexity is linearly related to the number of agents is proposed,and it is theoretically proved that the estimation error of the approximation algorithm is bounded with probability 1,and a sufficient condition to ensure convergence and the upper limit of the expected error are given.The relationship between the upper bound and the randomness of target dynamics,randomness of observation equation,number of agents,dimension of target state,topology of communication graph,and probability of regaining observations are obtained.Then,the robustness of the approximation algorithm with respect to the topology of the communication network and the parameters of the Markov chain is verified by numerical simulation.Distributed decision-making: Existing methods are difficult to adapt to restricted observation and communication conditions,and are also difficult to scale to environments with different numbers of agents and targets.This thesis compares the characteristics of different methods,and proposes a distributed tracking algorithm based on multi-agent reinforcement learning.Firstly,the tracking problem is modeled as a Decentralized Partial Observation Markov Decision Process(Dec-POMDP),and then a visit frequency map describing the action history is used to facilitate exploration,a belief map describing the observation history is designed to facilitate tracking,and a reward function comprehensively considering environment exploration and tracking is proposed,triggering communication rules are designed to realize collaborative exploration and tracking target assignment.Different from the existing methods,the visit frequency map and the belief map realize the decoupling of the state expression,the environmental characteristics and the number of targets,and the triggered communication rules are suitable for environments with different numbers of agents and limited communication range.Finally,a series of simulation experiments are conducted to verify the tracking effectiveness of the proposed method under different target performance and the scalability of the number of different agents and targets. |