
People direct nearly half of their attention to their conversation partner’s lip movements. In contrast, robots typically have only simplified “caricature” lips and mouths that don’t move in sync with the sounds they produce via their speakers.
Yuhang Hu and his team at Columbia University saw this as a major limitation, describing facial expression as the “missing link” in robotics.
Bringing Robotic Faces to Life
To address it, they developed a robot that can, for the first time, learn realistic lip movements for tasks like speaking and singing. In demonstrations, the robot successfully articulates words in multiple languages and even performs a song from an AI-generated debut album called Hello World.
The robot learns through observation rather than pre-programmed rules. Initially, it practiced using its 26 facial motors by watching itself in a mirror. Then it learned to mimic human lip movements by analyzing hours of YouTube videos. Like other AI systems, its performance improves with more training.
“When lip-syncing is combined with conversational AI, such as ChatGPT or Gemini, it deepens the connection a robot can form with humans,” Hu explained. “The more the robot observes human interactions, the better it becomes at replicating subtle facial expressions, allowing for richer emotional engagement.”
Creating realistic lip movements in robots is difficult for two main reasons. It requires specialized hardware with flexible facial “skin” and many tiny motors that operate quickly, silently, and precisely. Second, the patterns of lip motion are highly complex, dictated by the sequence of vocal sounds and phonemes.
Humans have about 30 facial and oral muscles beneath the skin that naturally coordinate with the vocal cords and lips. In fact, producing full speech engages 70 to 100 muscles. Robotic faces are typically rigid, with pre-programmed lip movements that look artificial and awkward.
Teaching a Robot to Learn Facial Expressions Through Self-Observation
Hu tackled these challenges by designing a flexible, highly articulated robot face with 26 motors. The robot first learned how its own face moved by observing itself in a mirror. Much like a child experimenting with facial expressions, it generated thousands of random movements. Gradually, it learned to control its motors to create specific expressions—an approach the team calls the “vision-action” language model.
Once the robot mastered this basic control, it was trained by watching videos of people speaking and singing. This allowed its AI to learn how human lips move in relation to various sounds. Combining these two learning processes, the robot became capable of translating audio directly into realistic lip movements.
The researchers admit the robot’s lip-syncing is not yet perfect. “We encountered challenges with strong sounds like ‘B’ and with sounds that require pursed lips, like ‘W’,” said Professor Hod Lipson, team coordinator. “However, these skills are expected to improve over time. This technology holds great potential, but we must advance cautiously to maximize benefits while minimizing risks.”
Read the original article on: Inovacao Tecnologica
Read more:Scientists Developed a Robotic Hand that Detaches and Walks on its Own
