How the AI ​​that drives ChatGPT will be implemented in the physical world

Companies like OpenAI and Midjourney are building chatbots, image generators and other artificial intelligence tools that work in the digital world.

Now, a startup founded by three former OpenAI researchers is using chatbot technology development methods to create AI technology that can navigate the physical world.

Covariant, a robotics company headquartered in Emeryville, California., creates ways for robots to pick up, move and sort items as they are transported through warehouses and distribution centers. Its goal is to help robots understand what's happening around them and decide what to do next.

The technology also gives bots a broad understanding of the English language, allowing people to chat with them as if they were chatting with ChatGPT.

The technology, still in development, is not perfect. But it's a clear sign that the artificial intelligence systems that drive chatbots and online image generators will also power machines in warehouses, roads and homes.

Like chatbots and image generators, this robotic technology learns its skills by analyzing huge amounts of digital data. This means that engineers can improve the technology by feeding it more and more data.

Covariant, backed by $222 million in funding, doesn't build robots. He builds the software that powers the robots. The company aims to deploy its new technology with warehouse robots, providing a roadmap for others to do the same in manufacturing plants and perhaps even on the roads with driverless cars.

The AI ​​systems that drive chatbots and image generators are called neural networksnamed after the network of neurons in the brain.

By identifying patterns in large amounts of data, these systems can learn to recognize words, sounds and images, or even generate them themselves. This is how OpenAI built ChatGPT, giving it the power to instantly answer questions, write essays, and generate computer programs. He acquired these skills from texts taken from the Internet. (Several media outlets, including the New York Times, have sued OpenAI for copyright infringement.)

Companies are now building systems that can learn from different types of data simultaneously. By analyzing both a collection of photos and the captions that describe those photos, for example, a system can grasp the relationships between the two. He can learn that the word “banana” describes a curved yellow fruit.

OpenAI used this system to build Sora, its new video generator. By analyzing thousands of captioned videos, the system learned to generate videos when given a brief description of a scene, such as “a beautifully rendered paper world of a coral reef, teeming with colorful fish and sea creatures.

Covariant, founded by Pieter Abbeel, a professor at the University of California, Berkeley, and three of his former students, Peter Chen, Rocky Duan and Tianhao Zhang, used similar techniques to build a system driving warehouse robots.

The company helps operate sorting robots in warehouses around the world. He has spent years collecting data – from cameras and other sensors – that shows how these robots work.

“It ingests all sorts of data that is important to robots, which can help them understand and interact with the physical world,” Dr. Chen said.

By combining this data with the massive amounts of text used to train chatbots like ChatGPT, the company has developed AI technology that gives its bots a much broader understanding of the world around it.

After identifying patterns in this mix of images, sensory data and text, the technology gives a robot the power to handle unexpected situations in the physical world. The robot knows how to pick up a banana, even if it has never seen one before.

It can also respond in plain English, much like a chatbot. If you tell him to “pick up a banana,” he knows what that means. If you tell him to “pick up a yellow fruit,” he understands that too.

It can even generate videos that predict what is likely to happen when it tries to pick up a banana. These videos have no practical use in a warehouse, but they do show the robot's understanding of its surroundings.

“If it can predict the next frames of a video, it can identify the right strategy to follow,” Dr. Abbeel said.

The technology, called RFM, for Robotics Foundational Model, makes mistakes, a bit like chatbots do. Even though he often understands what people are asking him, there is always the possibility that he might not do it. He drops things from time to time.

Gary Marcus, an AI entrepreneur and professor emeritus of psychology and neural sciences at New York University, said the technology could be useful in warehouses and other situations where mistakes are acceptable. But he added that it would be more difficult and riskier to deploy it in manufacturing plants and other potentially dangerous situations.

“It depends on the cost of the mistake,” he said. “If you have a 150-pound robot that can do something dangerous, that cost can be high.”

As companies train this type of system on increasingly large and varied collections of data, researchers believe it will improve quickly.

This is very different from how robots worked in the past. Typically, engineers would program the robots to perform the same precise movement over and over again, such as picking up a box of a certain size or attaching a rivet to a particular spot on a car's rear bumper. But robots couldn't deal with unexpected or random situations.

By learning from digital data – hundreds of thousands of examples of what happens in the physical world – robots can begin to handle the unexpected. And when these examples are combined with language, bots can also respond to text and voice suggestions, just like a chatbot would.

This means that, like chatbots and image generators, robots will become more agile.

“Digital data content can be transferred to the real world,” Dr. Chen said.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button