Abstract:The future air warfare is developing in the direction of unmanned and autonomous, and autonomous air warfare decision-making methods are one of the important support methods for future air warfare. Due to dimensional limitations, traditional air combat decision-making methods cannot handle continuous action and long-sighted decision-making problems. Based on the Actor-Critic method, this paper proposes a unified architecture for continuous decision-making in air combat. Combining air combat training experience, the state space, action space, reward and training subjects are rationally designed, and a variety of continuous action space reinforcement learning algorithms are tested in high uncertainty. The learning effect in the air combat scenario has been visually verified. The results show that: based on the method architecture proposed in this paper, long-sighted value optimization under continuous actions can be realized, the agent can make optimal decisions in complex air combat situations, and has a high kill rate against random maneuvering flying targets. And the air combat maneuver trajectory is highly reasonable.