From industrial arm to intelligent butler: how AI is transforming global robotics

From industrial arm to intelligent butler: how AI is transforming global robotics

New generative AI models give robots new cognitive capabilities. By learning and not by programming, they execute complex orders and adapt to their environment.

The video posted in February peaked at more than 1.3 million views. After receiving their instructions from a human, two humanoid robots coordinate to meticulously arrange groceries in a cupboard or refrigerator, taking into account the shape and storage conditions of the food. In other videos, just as striking, the same Helix robot, designed by the American Figure AI, fills the dishwasher or folds the laundry. Optimus, Tesla’s robot, demonstrates his dancing skills or practices kung fu.

While the veracity and conditions of production of these videos are still open to question, they demonstrate a major advance in the world of robotics.

From the single-task industrial robotic arm…

This disruptive leap is once again due to generative AI. Derived from the famous LLM, new models called VLM (Vision Language Model) are emerging, providing vision and language. Complementing VLMs, VLA (Vision-Language-Action) models translate visual and textual data into motor commands allowing the machine to perform a series of actions.

“There is a strong connection between robotics and generative AI, confirms Jean-Baptiste Mouret, research director in Nancy in the Hucebot team which works on the contribution of AI to human-centered robots. LLMs make it possible to generate and analyze text, images, videos. A versatility that we ask of a robot. These models benefit, moreover, from the “common sense” provided by LLMs. The robot will have a general understanding of the world to understand its environment and its context. It won’t be necessary to explain everything to him.”

Equipped with unprecedented cognitive capabilities, the AI-enhanced robot will execute complex orders autonomously or given in natural language, adapt its behavior to changing environments and make decisions based on sensory information. To take the example of Figure AI’s domestic robot, storing food in a cupboard requires using, among other things, vision and touch to define what the requested object is and how to reach it.

…to the versatile and autonomous robot

We speak of “zero-shot” learning when a robot can perform new tasks or interact with unknown objects without requiring specific training. To gain intelligence, the automaton will analyze, using VLMs, the images from its cameras and the voice instructions captured by its on-board microphones. For Aymeric Bethencourt, doctor in robotics and lead architect at IBM, generative AI will in some way give “a brain” to the robot.

In a blog post, the expert explains how a robot “learns” to make coffee. From the photo of a coffee machine, a generative AI model like GPT-4 is able to generate a set of instructions like take a capsule, open the lever of the machine, insert the capsule, press the button, etc. These actions are then transformed into motor commands allowing the left wrist – yes, our robot is left-handed – to operate a uniform and vertical rectilinear translation movement and a rotation at a given angle. Same for the left shoulder and so on.

“Resource-intensive, VLM models, comprising one to one hundred billion parameters, are hosted in the cloud due to on-board computing constraints, explains Aymeric Bethencourt. Smaller, between 100 million and one billion parameters, VLA models operate locally, to respond to the notion of real time and generate continuous and fluid movements.”

No Wikipedia of robotics

These models require large quantities of training data in order to teach the robot the multitude of tasks to be accomplished but also the laws and constraints of the physical world. Problem, “there is no Wikipedia for robotics, deplores Jean-Baptiste Mouret. The development of LLMs has benefited from the billions of texts produced since the beginning of humanity. The equivalent does not exist in robotics.”

As with Tesla, learning can be supervised by a human operator who, equipped with a virtual reality headset and sensors, executes the gestures that the robot must reproduce. This method of teaching being both long, laborious and expensive, the use of digital simulations in virtual environments is favored.

“World models”, like Geni 3 from Google DeepMind, generate interactive 3D worlds, playable in real time. “They make it possible to simulate coherent and photorealistic environments where the persistence of elements is ensured,” adds Aymeric Bethencourt. “This allows robots to acquire a generalized understanding of the physics of the world, thanks to hundreds of millions of videos and simulated examples.” The so-called “Sim-to-Real Transfer” process then consists of applying the knowledge acquired in simulation in the real world.

Humanoid robots in BMW and Mercedes factories

“Certain limits remain,” says Aymeric Bethencourt. human.” Not to mention the cybersecurity risks with possible remote takeover.

For the expert, the current development of VLM and VLA is comparable to that of the first LLMs, such as GPT-2 and GPT-3. Like the famous OpenAI chatbot, these models are nevertheless expected to make significant progress in the coming years. Meanwhile, the amount of data and computing resources required in the training phase reserve them for a few market players.

Actors who have the means to achieve their ambitions. In mid-September, Figure AI raised more than $1 billion, valuing the unicorn at $39 billion. Earlier in the year, Apptronik raised $350 million to deploy its robot in businesses. The two American start-ups have concluded contracts with German car manufacturers BMW and Mercedes-Benz respectively. On X, Elon Musk said that around 80% of Tesla’s value will come from its Optimus humanoid robots.

A market worth more than $62 billion in 2029

According to The Business Research Company, the artificial intelligence in robotics market is expected to grow from $17.89 billion in 2024 to $62.85 billion in 2029, an average annual growth rate of 28.6%. Many sectors of activity are potentially interested in these super robots. We spontaneously think of the automotive, aerospace and defense industries or the fields of health and personal services.

“Initially, the value of this type of robot will relate to the accomplishment of complicated, repetitive or arduous tasks, such as carrying heavy loads, or remotely operated in dangerous environments,” predicts Alexandre Embry, head of the Capgemini AI Robotics and Experiences Lab. Its versatility will allow it to accomplish different missions. “Depending on needs, a manufacturer can entrust its fleet of robots with quality control, cleaning or maintenance operations.”

The Hucebot team at Inria is working, for its part, on the use of an exoskeleton – a robot that you wear – and on robots that operate remotely for dangerous missions such as the decontamination of nuclear sites. She collaborates with the firefighters and the Nancy hospital. “In the case of firefighters, this involves helping them with extrication operations following a car accident,” explains Jean-Baptiste Mouret. “To do this, they must carry heavy and bulky equipment. At the university hospital, cleaning sheets in the laundry requires a lot of handling. In the operating room, surgeons must hold particular positions without trembling.”

“The first deployments will probably concern industrial environments where the risks linked to a bad action are limited, such as on certain automobile assembly lines where the robot operates alone, estimates Aymeric Bethencourt. It is only in a second phase, by 2035-2040, that such robots will be able to integrate homes to assist individuals on a daily basis.” You will still have to wait to get rid of the chore of washing dishes or laundry.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment