[NEW] REGENT will be presented at the NeurIPS 2024 workshops on Adaptive Foundation Models and Open World Agents. See you at Vancouver!
Do generalist agents only require large models pre-trained on massive amounts of data to rapidly adapt to new environments?
We propose a novel approach to pre-train relatively small models and adapt them to unseen environments via in-context learning, without any finetuning. Our key idea is that retrieval offers a powerful bias for fast adaptation. Indeed, we demonstrate that even a simple retrieval-based 1-nearest neighbor agent offers a surprisingly strong baseline for today's state-of-the-art generalist agents.
From this starting point, we construct a semi-parametric agent, REGENT, that trains a transformer-based policy on sequences of queries and retrieved neighbors. REGENT can generalize to unseen robotics and game-playing environments via retrieval augmentation and in-context learning, achieving this with up to 3x fewer parameters and up to an order-of-magnitude fewer pre-training datapoints, significantly outperforming today's state-of-the-art generalist agents.
We evaluate all agents in two problem settings: JAT/Gato Environments and ProcGen Environments. Select the problem setting in the dropdown menu.
REGENT is pretrained on data from many training environments (left). REGENT is then deployed on the held-out environments (right) with a few demonstrations from which it can retrieve states, rewards, and actions to use for in-context learning.
REGENT, shown in the GIF below, can be described in the following four steps. Use the restart gif button to replay the gif from the start.
(1) A query state (from the unseen environment during deployment or from training environments' datasets during pre-training) is processed for retrieval. (2) The n nearest states from a few demonstrations in an unseen environment or from a designated retrieval subset of pre-training environments' datasets are retrieved. These states, and their corresponding previous rewards and actions, are added to the context in order of their closeness to the query state, followed by the query state and previous reward. (3) The predictions from the REGENT transformer are combined with the first retrieved action. (4) At deployment, only the predicted query action is used. During pre-training, the loss from predicting all actions is used to train the transformer.
Retrieve-and-Play (R&P), our simple and strong baseline for today's state-of-the-art generalist agents, simply retrieves the nearest state (s') and plays the corresponding action (a').
We plot the normalized return obtained by all methods in various unseen environments for various number of demonstrations. Use the dropdown menu to toggle amongst the four figures. We compare with JAT (from Huggingface), an open source reproduction of Gato (from Google Deepmind). Its 'All Data' variant is pre-trained on 5-10x the data used by REGENT. We also compare with JAT/Gato after it is finetuned on the few demonstrations that are available in each unseen environment. We compare with MTT (from Meta) on the ProcGen environments, where it was pretrained on a order-of-magnitude more data than REGENT.
We observe that R&P and REGENT can generalize well to unseen metaworld and atari environments. R&P is a surprisingly strong baseline, but REGENT improves on R&P consistently. JAT/Gato cannot generalize to most unseen environments. REGENT (and even R&P) outperform even the 'All Data' variants of JAT/Gato which were pre-trained on 5-10x the amount of data. Both JAT/Gato models struggle to perform even after finetuning. But, REGENT further improves after finetuning, even with only a few demonstrations. To highlight REGENT's ability to adapt, we note that we only vary the number of states in Atari demonstrations until 25k. The closest generalist policy that finetunes to new Atari environments, MGDT (from Google Deepmind), requires 1M transitions.
We plot examples of a few inputs and outputs of REGENT for two states in a rendered rollout in the unseen atari-pong and metaworld-bin-picking environments. Use the dropdown menu to toggle between the two. The restart gif button replays the gif from the start.
REGENT leverages its in-context learning capabilities and interpolation with R&P to either make a simple decision and predict the same action as R&P (see blue box on the right) or predict better actions at key states (see black box on the left) that leads to better overall performance as seen in results above.
@inproceedings{
anonymous2024regent,
title={{REGENT}: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments},
author={Anonymous},
booktitle={Submitted to The Thirteenth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=NxyfSW6mLK},
note={under review}
}