Microsoft researchers unveiled an experimental framework last week to control drones and robots through language using ChatGPT, a well-known AI language model developed by OpenAI.
ChatGPT can create specialized code that directs robot motion using commands given in natural language. A human then reviews the outcomes and makes any necessary adjustments until the task is successfully completed.
The findings were published in the paper “ChatGPT for Robotics: Design Principles and Model Abilities,” written by Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor of the Microsoft Autonomous Systems and Robotics Group.
In a demo video, Microsoft shows robots assembling blocks into the Microsoft logo with a robot arm, using a drone to scan the contents of a shelf, and using a robot with a vision to locate objects.
These robots appear to be controlled by code created by ChatGPT while obeying human commands.
The researchers taught ChatGPT a special robotics API so that it could communicate with robotics. ChatGPT can generate robotics control code in the same way that it would write a poem or finish an essay when given instructions like “pick up the ball.”
A human operator can carry out the task and assess its performance after reviewing and editing the code for accuracy and safety.
ChatGPT is not an autonomous system, but it does speed up robotic control programming in this way. As stated in the paper, “We emphasize that the use of ChatGPT for robotics is not a fully automated process but rather acts as a tool to augment human capacity.
While it appears that humans provide the majority of ChatGPT’s feedback (in terms of the effectiveness or failure of its actions) in the form of text, the researchers also assert that they have had some success providing visual data to ChatGPT.
In one instance, researchers used ChatGPT to instruct a robot to catch a basketball while receiving feedback from a camera.
The researchers stated that ChatGPT can estimate how the ball and the sky will appear in the camera image using SVG code.
This behavior raises the possibility that the LLM maintains a model of the world that goes beyond text-based probabilities.
Although the outcomes at this point appear rudimentary, they show early attempts to incorporate robotic control with the newest technology, large language models.
Microsoft claims that a ChatGPT interface may in the future make robotics accessible to a larger audience.
In order to assist with robotics tasks, the aim of this research is to determine whether ChatGPT can reason about the physical world in addition to text.
We want to make it simpler for people to interact with robots by removing the need for them to learn difficult programming languages or specifics about robotic systems.