EN / RU / 🤖
← Back to essays
· Essay · 1 min

RT-2: A New Level of Interaction Between Robots, Language, and Vision

Scientists have developed the RT-2 model, which combines vision and language for robot control.

<p>Scientists have developed a new model that combines vision and language in robot control, enhancing their ability to generalize and perform semantic reasoning. This model, named RT-2, utilizes large volumes of data from the internet for training.</p>
<p>The RT-2 model transforms robot actions into text tokens, allowing it to be trained on the same data as conventional language models. As a result, RT-2 can interpret commands that were not included in its initial training data and perform complex tasks based on user commands.</p>
<p>For example, RT-2 can determine which object to use as a hammer or which drink would suit a tired person. This opens up new possibilities for more complex and flexible interactions between robots and the surrounding world.</p>
<p>📝 Paper: <a href="https://arxiv.org/abs/2307.15818">https://arxiv.org/abs/2307.15818</a></p>;

<p>#ai #agi #robotics #gpt #llm</p>