Robotics developer Image made waves on Wednesday with a video demonstration of the first humanoid robot having a real-time conversation thanks to AI created by OpenAI.
“With OpenAI, Image 01 can now have full conversations with people,” Image said on Twitter, highlighting its ability to understand and respond to human interactions.
The company recently explained that its partnership with OpenAI will bring advanced visual and linguistic intelligence to its robots, allowing for “quick, low-level, efficient robotic actions.”
In the video, Figure 01 meets its creator, Senior AI Engineer Corey Lynch, who has the robot perform several tasks in a makeshift kitchen, including sorting apples, plates and cups.
Figure 01 When Lynch asks the robot for something to eat, he reveals that an apple is food. Lynch Figure 01 collects waste into a basket and asks questions at the same time, demonstrating the robot's versatility.
On Twitter, Lynch explained Figure 01 in more detail.
“Our robot can describe its visual experience, plan future actions, reflect on its memory, and verbalize its logic,” he wrote in an extensive thread.
According to Lynch, they feed images from the robot's cameras and transcribe the speech captured by onboard microphones into a large multimodal model trained by OpenAI.
Multimodal AI refers to artificial intelligence that can understand and generate different types of data, such as text and images.
Lynch emphasizes that Figure 01 is behavior-learned, runs at normal speeds, and is not remotely controlled.
“The model processes the entire history of the conversation, including past images, to retrieve language responses, which are then translated into text-to-speech to the person,” Lynch said. “The same model is responsible for deciding which learned and closed-loop behavior to execute on the robot to execute a given command, load specific neural network weights on the GPU, and execute the policy.”
Lynch explains that Figure 01 is designed to describe the environment succinctly and can apply “common sense” to decisions, like cooking dishes placed on a shelf. It can also be parsed whenever vague expressions such as hunger describe actions such as offering an apple.
The first presentation caused quite a stir on Twitter, with many people impressed by Figure 01's capabilities – and more than a few adding it to their list of milestones on the road to singularity.
Please tell me your team has seen every Terminator movie,” one replied.
“We need to get John Connor as soon as possible,” added another.
For AI developers and researchers, Lynch provided several technical details.
“All the features are driven by neural network visuomotor transformer policies, processing pixels directly into actions,” said Lynch. “These networks capture board images at 10hz and generate 24-DOF actions (wrist and finger joint angles) at 200hz.”
Image 01's influential debut comes as policymakers and global leaders try to wrestle AI tools into mainstream circulation. While much of the discussion centers around large-scale language models such as OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude AI, developers are looking for ways to give AI physical humanoid robot elements.
Image AI and OpenAI did not immediately respond to Decrypt's request for comment.
“One is utility, which is what Elon Musk and others are pursuing,” UC Berkeley industrial engineering professor Ken Goldberg previously told Decrypt. “A lot of the work that's going on right now – why do people invest in these companies – as a picture – the hope is that these things can work and adapt,” he said, especially in the field of space research.
Along with drawing, others working to integrate AI into robotics include Hansen Robotics, which debuted the Desdemona AI robot in 2016.
“Even a few years ago, I thought we'd have to wait decades to see a fully conversational human robot planning and executing fully learned behavior,” Corey Lynch, AI's senior AI engineer, said on Twitter. “Of course, a lot has changed.”
Edited by Ryan Ozawa.
Stay on top of crypto news, get daily updates in your inbox.