RT-2: A New Level of Interaction Between Robots, Language, and Vision

Scientists have developed a new model that combines vision and language in robot control, enhancing their ability to generalize and perform semantic reasoning. This model, named RT-2, utilizes large volumes of data from the internet for training.
The RT-2 model transforms robot actions into text tokens, allowing it to be trained on the same data as conventional language models. As a result, RT-2 can interpret commands that were not included in its initial training data and perform complex tasks based on user commands.
For example, RT-2 can determine which object to use as a hammer or which drink would suit a tired person. This opens up new possibilities for more complex and flexible interactions between robots and the surrounding world.
📝 Paper: <a href="https://arxiv.org/abs/2307.15818">https://arxiv.org/abs/2307.15818</a>;

#ai #agi #robotics #gpt #llm