Your upcoming earbuds might be capable of translating text and recognizing objects for you.
Researchers from the University of Washington have created a novel prototype system that may transform how individuals engage with artificial intelligence in their everyday lives. Named VueBuds, the system incorporates small cameras into typical wireless earbuds, enabling users to ask an AI model questions about their surroundings in near real-time.
The idea is straightforward yet impactful. A user can focus on an object, like a food package labeled in a foreign language, and request the AI to translate it. The system responds with an answer through the earbuds in approximately one second, facilitating a smooth, hands-free interaction.
A Different Approach to AI Wearables
In contrast to smart glasses, which have faced adoption challenges linked to privacy issues and design constraints, VueBuds opts for a more discreet method. The system employs low-resolution, black-and-white cameras built into the earbuds to take still images instead of continuous video.
University of Washington
These images are sent via Bluetooth to a linked device, where a small AI model processes them locally. This on-device processing prevents the need to transmit data to the cloud, addressing major concerns related to wearable cameras.
To further prioritize privacy, the earbuds feature a visible indicator light when recording and enable users to instantly delete captured images.
Engineering Around Power and Performance Limits
One of the primary challenges encountered by the research team was power consumption. Cameras require significantly more energy than microphones, making high-resolution sensors, like those in smart glasses, impractical.
To address this, the team utilized a camera approximately the size of a grain of rice, capturing low-resolution grayscale images. This method reduces battery usage and allows for efficient Bluetooth transmission without sacrificing responsiveness.
Placement also played a crucial role. By angling the cameras slightly outward, the system achieves a field of view ranging from 98 to 108 degrees. Although there is a minor blind spot for objects held extremely close, researchers determined this does not impact typical usage.
The system also merges images from both earbuds into a single frame, enhancing processing speed. This enables VueBuds to provide responses in about one second, compared to two seconds when processing images separately.
Performance Compared to Smart Glasses
In evaluations, 74 participants compared VueBuds with smart glasses like Meta’s Ray-Ban models. Despite utilizing lower-resolution images and local processing, VueBuds delivered comparable performance overall.
The report indicated that participants favored VueBuds for translation tasks, while smart glasses were more effective at object counting. In separate tests, VueBuds achieved accuracy rates of approximately 83–84% for translation and object identification, with rates climbing to 93% for recognizing book titles and authors.
Why This Matters and What Comes Next
This research suggests a potential shift in the design of AI-powered wearables. By incorporating visual intelligence into a device that people already use, the system circumvents many of the obstacles encountered by smart glasses.
Nevertheless, limitations still exist. The current system cannot process color, and its capabilities remain in the early stages. The team intends to investigate the addition of color sensors and the development of specialized AI models for tasks like translation and accessibility support.
The researchers are set to present their findings at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona, providing a glimpse into a future where everyday devices seamlessly evolve into intelligent assistants.
Other articles
Your upcoming earbuds might be capable of translating text and recognizing objects for you.
Researchers at the University of Washington developed AI earbuds equipped with cameras that analyze the environment while focusing on privacy and processing information directly on the device.
