
Challenging the memory of RL agents
Reinforcement learning agents are usually trained to maximize their rewards by taking actions in an environment following a Markov Decision Process (MDP). A Markov Decision Process is simply a model that defines the state of an environment by its current state, actions, and rewards, including also its possible future states. The key point is that agents know information from the present and can approximately predict … Continue reading Challenging the memory of RL agents