Challenging the memory of RL agents

Reinforcement learning agents are usually trained to maximize their rewards by taking actions in an environment following a Markov Decision Process (MDP). A Markov Decision Process is simply a model that defines the state of an environment by its current state, actions, and rewards, including also its possible future states. The key point is that agents know information from the present and can approximately predict … Continue reading Challenging the memory of RL agents

Exploring Transformer Model for Reinforcement Learning

MLP is widely used in RL to implement a learnable agent in a certain environment trained according to a specific algorithm. Recent works in NLP have already proved that Transformer can replace and outperform MLP in most tasks leading to expanding its utilization in areas outside of NLP such as Computer Vision. However, in RL the Transformer architecture is still not widely adopted, and agents … Continue reading Exploring Transformer Model for Reinforcement Learning

Visualizing Loss Landscape of GAIL

This post aims to visualize the loss landscape of some imitation policies (IL policies) trained with GAIL, and their discriminator trained in three common environments: Cartpole, Lunarlander, and Walker2d from Mujoco.┬áThe expert policy of Cartpole and Lunarlander is a simple Double DQN while the expert of Walker2d, which supports continuous actions, is a DDPG policy. The imitation policies are the same policies employed by their … Continue reading Visualizing Loss Landscape of GAIL

Learning to imitate: using GAIL to imitate PPO

Usually, in reinforcement learning, the agent is provided with a reward according to the action it executes to interact with the environment and its goal is to optimize its total cumulative reward over multiple steps. Actions are selected according to some observations the agent has to learn to interpret. In this post, we are going to explore a new field called imitation learning: the agent … Continue reading Learning to imitate: using GAIL to imitate PPO

Automatic code generator for training Reinforcement Learning policies

Generate custom template code to train your reinforcement learning policy using a simple web UI built with┬áStreamlit. It includes different environments and can be expanded to support multiple policies and frameworks with a high level of flexible hyperparameters customization. The generated code can be easily downloaded as .py file or Jupyter Notebook so as to immediately start training your model or use it as a … Continue reading Automatic code generator for training Reinforcement Learning policies

Adversarial policies: attacking TicTacToe multi-agent environment

In a previous post we discussed about the possibility for an attacker to fool image classification models by injecting adversarial noise directly to the input images. Similarly, in this post we are going to see how is it possible to attack deep reinforcements learning agents on multi-agent environments (where two or more agents interact within the same environment) such that one or more agents are … Continue reading Adversarial policies: attacking TicTacToe multi-agent environment

Teaching AI to play Snake with Reinforcement Learning

It is well known that two of the most fascinating fields of computer science are gaming and artificial intelligence. The gaming field saw its origins back in the 1970s when gaming consoles such as Atari 2600, along with graphics on computer screens and home computer games were introduced to the general public giving birth to different kinds of arcade games like Pong and Pacman. In … Continue reading Teaching AI to play Snake with Reinforcement Learning

Introduction to Deep Reinforcement Learning

Deep Reinforcement Learning is the result of the combination of two well-known machine learning approaches: Deep Learning and Reinforcement Learning. Its main goal is the one to create a single agent able to handle any human-level task but achieving super-human results on it. A famous AI implementing this technique is AlphaGo that, in March 2016, defeated for the first time in the history a 9-dan … Continue reading Introduction to Deep Reinforcement Learning

Balancing CartPole with policy gradients algorithm

In this post, we are going to analyze a type of reinforcement learning algorithm called policy gradients and we will use it to train an agent to balance a pole on a moving cart. This type of environment is also known as CartPole. In the field of reinforcement learning, we have an agent making observations and taking actions within an environment in order to receive … Continue reading Balancing CartPole with policy gradients algorithm