Earbuds with tiny cameras let users talk to AI about what they see - Scitke

Researchers at the University of Washington have created the first system that embeds tiny cameras into off-the-shelf wireless earbuds, enabling users to converse with an AI model about what they’re seeing. For example, someone could look at a Korean food package and say, “Hey Vue, translate this,” and hear a response like, “The visible text translates to ‘Cold Noodles’ in English.”

The prototype, known as VueBuds, captures low-resolution black-and-white images and sends them via Bluetooth to a smartphone or nearby device. A compact AI model running locally then responds to user queries about the images in roughly a second. To protect privacy, all processing stays on the device, a small indicator light signals when recording is active, and users can instantly delete captured images.

CHI 2026 Presentation and Proceedings Publication

The researchers presented their work on April 14 at the CHI 2026 conference in Barcelona, and the study appears in the Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems.

“We haven’t seen widespread adoption of smart glasses or VR headsets, partly because many people are uncomfortable wearing them and they raise privacy concerns, like recording high-resolution video and sending it to the cloud,” said senior author Shyam Gollakota, a professor at UW’s Paul G. Allen School of Computer Science & Engineering. “Since earbuds are already widely used, we wanted to explore whether we could bring visual intelligence into small, low-power devices while also addressing those privacy issues.”

Cameras consume significantly more power than the microphones typically found in earbuds, making high-resolution cameras like those used in smart glasses impractical. In addition, Bluetooth can’t handle the constant transmission of large data volumes, so the system isn’t designed to support continuous video streaming.

Low-Power Earbud Cameras and Field-of-View Optimization

The team discovered that a low-power camera—about the size of a grain of rice—capturing low-resolution black-and-white still images helped reduce battery consumption while keeping Bluetooth transmission feasible without sacrificing performance.

Placement also posed a challenge. “One key question was whether the user’s face would block too much of the view, and whether cameras in earbuds could reliably capture what the user sees,” said lead author Maruchi Kim, who conducted the work as a doctoral student in UW’s Allen School. The researchers found that angling each camera 5–10 degrees outward delivers a 98–108 degree field of view. Although this setup creates a small blind spot for objects closer than 20 centimeters, it’s rarely a problem since people typically don’t hold items that close when examining them.

They also observed that while the vision-language model could interpret images from each earbud, processing them separately introduced delays. To address this, the system merges the two images by identifying overlapping areas and combining them into a single view. This reduces response time to about one second—fast enough to feel real-time—compared to roughly two seconds when handling the images individually.

Comparative User Testing and System Accuracy Results

In user testing, 74 participants compared outputs from VueBuds with those from Ray-Ban Meta Glasses across several tasks. Despite relying on lower-resolution images with stronger privacy safeguards, VueBuds performed on par with the Ray-Bans, which use high-resolution images processed in the cloud. Participants favored VueBuds for translations, while the Ray-Bans showed better performance in counting objects.

Sixteen participants also tested VueBuds by wearing the device and evaluating its ability to translate text and answer simple questions about objects. The system achieved 83%–84% accuracy in translation and object identification, and 93% accuracy when identifying a book’s author and title.

The study aimed to assess whether embedding cameras in wireless earbuds is practical. Because the system captures only grayscale images, it cannot handle questions that depend on color information in the scene.

The team aims to incorporate color into the system, though color cameras would demand more power, and to develop specialized AI models tailored to tasks like translation.

“This work offers a glimpse of what’s possible using a general-purpose language model with camera-equipped wireless earbuds,” Kim said. “Next, we want to evaluate the system more thoroughly for applications such as reading for people with low vision or blindness, or translating text for travelers.”

whatsapp image 2026 04 16 at 14.46.23 (1)

Read the original article on: Tech Xplore