Visualizing Loss Landscape of GAIL

This post aims to visualize the loss landscape of some imitation policies (IL policies) trained with GAIL, and their discriminator trained in three common environments: Cartpole, Lunarlander, and Walker2d from Mujoco. The expert policy of Cartpole and Lunarlander is a simple Double DQN while the expert of Walker2d, which supports continuous actions, is a DDPG policy. The imitation policies are the same policies employed by their