Learning to imitate: using GAIL to imitate PPO

Usually, in reinforcement learning, the agent is provided with a reward according to the action it executes to interact with the environment and its goal is to optimize its total cumulative reward over multiple steps. Actions are selected according to some observations the agent has to learn to interpret. In this post, we are going to explore a new field called imitation learning: the agent … Continue reading Learning to imitate: using GAIL to imitate PPO