Robotics

Density Estimation Approach 2

Researchers:

Zohar Rimon, Aviv Tamar, and Gilad Adler

The generalization problem in RL is how to train a policy on a set of training tasks to solve an unseen (but similar) test task. The challenge is that all RL algorithms strongly overfit. In this work we discovered that learning Maximum-Entropy exploration generalizes better than learning to maximize reward. We use this to set a new SOTA for the ProcGen benchmark.

Explore to Generalize in Zero-Shot RL 2

Researchers:

Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar

The generalization problem in RL is how to train a policy on a set of training tasks to solve an unseen (but similar) test task. The challenge is that all RL algorithms strongly overfit. In this work we discovered that learning Maximum-Entropy exploration generalizes better than learning to maximize reward. We use this to set a new SOTA for the ProcGen benchmark + significantly improve on hard games like Maze and Heist.

Density Estimation Approach

Researchers:

Zohar Rimon, Aviv Tamar, and Gilad Adler

The generalization problem in RL is how to train a policy on a set of training tasks to solve an unseen (but similar) test task. The challenge is that all RL algorithms strongly overfit. In this work we discovered that learning Maximum-Entropy exploration generalizes better than learning to maximize reward. We use this to set a new SOTA for the ProcGen benchmark + significantly improve on hard games like Maze and Heist.

Explore to Generalize in Zero-Shot RL

Researchers:

Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar

The generalization problem in RL is how to train a policy on a set of training tasks to solve an unseen (but similar) test task. The challenge is that all RL algorithms strongly overfit. In this work we discovered that learning Maximum-Entropy exploration generalizes better than learning to maximize reward. We use this to set a new SOTA for the ProcGen benchmark + significantly improve on hard games like Maze and Heist.

Meta Reinforcement Learning with Finite Training Tasks

Researchers:

Zohar Rimon, Aviv Tamar, and Gilad Adler

In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task. The optimal meta RL policy is well defined, and here we explore how many training tasks are required to guarantee approximately optimal behavior with high probability. Key in our approach is using the implicit regularization of kernel density estimation methods, which we use to estimate the task distribution. We further demonstrate that this regularization is useful in practice, when `plugged in’ the state-of-the-art VariBAD meta RL algorithm