| Imitation learning is a method for learning optimal policy from expert demonstrations without relying on environmental reward information.Generative adversarial imitation learning combines the decision-making ability of imitation learning with the representational ability of generative adversarial networks,and has become one of the research hotspots in the field of imitation learning by demonstrating powerful intelligence and good generalizability in high-dimensional continuous control tasks.However,generative adversarial imitation learning assumes that the expert demonstrations are generated by a single modal expert policy and an adversarial-based training process,which leads to certain shortcomings in its modal representation capability,algorithmic stability,and sample utilization,which seriously limits its application in complex real-world tasks.To address the problems arising in the adversarial training process of generative adversarial imitation learning,this paper proposes imitation learning frameworks to solve the mode collapse problem,improve the training stability,and increase the sample utilization.The specific research can be divided into the following three areas:ⅰ.For the mode collapse problem of generative adversarial imitation learning,the multi-modal imitation learning algorithm with cosine similarity is proposed,which adds an encoder to the generative adversarial imitation learning framework and multiple policies instead of a single policy.The use of multiple policies allows different modal policies to be updated without affecting each other.In addition,the encoder is used to automatically extract modal features from the expert demonstrations and construct cosine constraints based on these modal features.The distribution of samples of the same modality is closer under the cosine term constraint,and the distribution of samples of different modalities is as far away as possible,so that agent can learn multi-modal policy well.ⅱ.The dual discriminator generative adversarial imitation learning with soft update is proposed to address the problem of instability in generative adversarial imitation learning training.The method adds a new discriminator to the framework of generative adversarial imitation learning.In the dual discriminator generative adversarial imitation learning with soft update algorithm,the two discriminators are trained separately using different loss functions and the dual discriminators are updated using soft update to make the training process of generative adversarial imitation learning more stable.ⅲ.To address the problem of low utilization of generative samples for generative adversarial imitation learning,the generative adversarial imitation learning with experience replay is proposed,which adds a replay buffer to the generative adversarial imitation learning framework.Generative adversarial imitation learning with experience replay uses Wasserstein distance as a metric to select instructive generative samples,and stores these generative samples in the replay buffer.In the next training process,the generated samples from the replay buffer are added to the training samples of the discriminator to improve the utilization of the generated samples and speed up the training process of generative adversarial imitation learning. |