Abstract:UAV have gradually replaced manned aircraft to combat with advantages such as high effectiveness and flexible autonomy. Multi-UAV cooperative cambat mission planning has attracted widespread attention. An end-to-end cooperative attack intelligent planning method for multi-UAV based on deep reinforcement learning (DRL) is presented to overcome the shortcomings of traditional mission planning algorithms, such as dependence on static, low-dimensional simple scenarios and slow on-board computing power. The SEAD mission planning is modeled as the Markov decision process. The SEAD intelligent planning model based on PPO algorithm is established and the general intelligent planning architecture is proposed. We introduce domain randomization, maximizing the entropy of policy and the lower-layer network parameter sharing training tricks, to improve the effectiveness and generalization performance of PPO. Simulation results show that the DRL-based model can achieves fast and fine planning through offline training and online planning, adapt to unknown, continuous and high-dimensional environment situation, which reflects provides a new idea for intelligent planning research.