Designed to aid rather than substitute: exploring Intercall’s real-time AI for professional interpreters.
Among the interpreters who utilize it, the consensus is clear: finally, a solution designed for their actual work methods. The concept is straightforward. Interpreting is most effective when humans and machines collaborate rather than machines replacing humans.
Interpreting can be one of the most challenging tasks a person can perform in real time. An interpreter must listen, comprehend, rephrase, and speak almost instantaneously, often while dealing with an unfamiliar accent or a stream of numbers that comes faster than anyone can write. Researchers explain a “tightrope hypothesis,” pointing out that interpreters operate at the limits of their cognitive capacity for most of an assignment, with accuracy beginning to decline as the cognitive load increases. There is also a physical toll; time constraints elevate stress and heart rates, leading to a drop in quality during sessions that extend beyond half an hour, which is why interpreters typically rotate every 30 minutes.
As the work has migrated online, that pressure has intensified. The interpreting industry is valued at approximately $11.7 billion, according to Nimdzi, with an increasing portion of this work now conducted remotely: via video, through inconsistent audio, and in hybrid meetings that have only emerged in the last five years. The captioning systems interpreters could access were never designed with them in mind. Intercall was created to bridge that gap.
Designed for interpreters, Intercall is a real-time captioning and translation platform featuring a human-in-the-loop model, meaning humans maintain control while the software provides support. As interpreters work, Intercall displays everything being said on screen, in text format, in both languages, immediately after it is spoken.
Its founder, Bahodir Rajabov, started with a common oversight in most products. An interpreter doesn’t require a polished transcript after the meeting; they need the appropriate word at the moment, quickly enough to maintain their focus. This includes catching the name that wasn’t heard, the figure that passed too quickly, or the specialized term from a field they don’t frequently work in.
Rajabov began programming at 14 in Bukhara, Uzbekistan, and later joined IBM's generative AI team. He developed Intercall to address an issue that other products in the market had overlooked and was the main architect behind its essential components: native audio capture, a low-latency transcription pipeline, the terminology system, and the cross-platform desktop workflow. Existing captioning systems available to interpreters fell behind the speaker by three to five seconds. While that may be adequate for subtitles, it poses a significant problem for medical interpreters conveying symptoms in real time or in courtrooms, where even a half-second delay can alter the meaning of a statement.
As he states: “No one had created anything specifically for interpreters. They excel in their craft but use tools not tailored for them. Captions designed for audiences and translation apps for travelers were inadequate. They simply required a tool designed for their needs.”
Operationally, the breakthrough for Intercall was moving beyond the browser. Instead, it functions as native software on the interpreter’s machine, akin to programs like Word or Zoom, rather than being accessed via a website. Written in C++, it interfaces directly with the operating system and captures audio from any live call, whether on Zoom, Microsoft Teams, or Google Meet. There are no browser extensions, meeting bots, or screen-sharing involved.
Intercall performs three key functions. It transcribes conversations in real time, ensuring the text aligns with the speaker instead of lagging behind. “It must feel instantaneous,” Rajabov emphasizes. “If the text falls behind, it’s ineffective, and interpreters will simply discontinue using it.” It captures critical details that are most prone to being overlooked and most costly to be incorrect, such as proper nouns and specialized terms, allowing interpreters to load up to 600 of their own specific terms before a session — like cardiology terminology before a medical appointment or case names before a legal hearing. Additionally, during multilingual calls, it automatically shifts between languages and dialects, accommodating multiple languages in one conversation, reflecting the actual dynamics of interpreted discussions.
For many interpreters, this single window replaces a collection of tools originally designed for others: meeting captions, notepads, and open translation tabs. The impact is evident in the quality of their work, resulting in fewer requests for repetition and the ability to relay long, complex passages without interrupting the conversation's flow. There’s significantly less frantic note-taking that can exhaust interpreters by the day’s end, allowing the focus on capturing and understanding details to revert back to the interpreting task itself.
Intercall is designed to support rather than take over. It does not perform the interpretation for users. Instead, it surfaces information that might otherwise be overlooked, returning it to the interpreter, who retains control over the aspects only a human can manage: the meaning and judgment. This choice placed Intercall in an emerging category that analysts now monitor, focusing on AI tools
Other articles
Designed to aid rather than substitute: exploring Intercall’s real-time AI for professional interpreters.
Intercall displays live transcription and translation on the screen in both languages while interpreters are at work, capturing names, numbers, and terms that may be missed under cognitive pressure. It is utilized by over 3,000 interpreters across 18 countries.
