The high cost and scheduling burden of human translators may be about to change.
For as long as language has divided human communities, the work of bridging those divides has depended on rare human skill and considerable expense. This week at InfoComm 2026, ENCO introduces enSpeak — a real-time voice-to-voice translation system that listens to live speech, translates it, and renders it back as natural-sounding audio in multiple languages simultaneously, without a human interpreter in the room. Built atop ENCO's existing captioning and translation platforms, the technology arrives at a moment when AI speech synthesis has matured enough to make multilingual communication feel less like a premium accommodation and more like a basic expectation.
- The persistent cost and logistical burden of hiring professional translators has long made multilingual events a luxury — enSpeak targets that friction directly by automating the entire voice-to-voice pipeline.
- The stakes extend well beyond conference rooms: classrooms, courtrooms, houses of worship, and transportation hubs all face the same fundamental barrier when audiences speak different languages.
- ENCO is threading together three existing layers — live captioning via enCaption, text translation via enTranslate, and now spoken output via enSpeak — into a single workflow that AV integrators can deploy on-premises, in the cloud, or both.
- Audiences can receive translated speech through their phones, hearing aids, or venue audio systems while captions run in parallel, giving organizations multiple ways to reach multilingual listeners at once.
- The technology is landing at InfoComm 2026 with live demonstrations, positioning enSpeak not as a future concept but as a deployable solution ready for the venues and organizations that need it now.
ENCO is arriving at InfoComm 2026 with enSpeak, a real-time voice translation system designed to remove one of the most persistent friction points in professional AV: the cost and complexity of serving multilingual audiences. Rather than displaying translated text alone, enSpeak converts live speech into natural-sounding translated audio that listeners can hear through their phones, hearing aids, or a venue's own sound system — while captions appear alongside for those who want them.
The system builds on infrastructure ENCO has been developing since at least the NAB Show in April. Live audio flows first into enCaption, which generates real-time captions using speech-to-text technology. Those captions pass into enTranslate for the actual language conversion, and enSpeak then renders the result as expressive spoken audio rather than robotic synthesis. The whole chain can run on a company's own servers, in the cloud, or in a hybrid configuration — a flexibility that matters for organizations with varying infrastructure and privacy requirements.
CEO Ken Frommert frames the value proposition simply: modern audiences are multilingual, and organizations want to reach them without scheduling and paying for human translators at every event. The potential applications span classrooms delivering simultaneous multilingual instruction, corporate town halls bridging regional offices, conference centers hosting international attendees, and public venues from courtrooms to museums where clear cross-language communication carries real consequence.
What gives the moment its weight is the convergence of two trends: AI speech synthesis has matured to the point where translated audio sounds genuinely natural, and the economics of multilingual communication are shifting as a result. If enSpeak performs as demonstrated, language inclusion may be on its way from being a premium accommodation to a standard feature of professional AV environments.
ENCO is bringing a new tool to the professional AV world this week at InfoComm 2026, one that could reshape how organizations handle the practical problem of reaching audiences who speak different languages. The company is unveiling enSpeak, a real-time voice translation system that takes live speech, translates it, and converts it back into natural-sounding audio—all without requiring a human translator standing in the wings.
The technology builds on work ENCO first showed at the NAB Show in April. What makes enSpeak different from existing translation tools is that it doesn't just convert words on a screen; it produces actual spoken language that audiences can hear through their phones, hearing aids, or the venue's audio system. A presenter speaking English can be heard in Spanish, Mandarin, or a dozen other languages simultaneously, with captions appearing alongside the translated speech.
The workflow starts with ENCO's enCaption platform, which listens to live audio and video and generates accurate captions in real time using speech-to-text technology. Those captions then flow into enTranslate, which handles the actual translation work—either on a company's own servers or in the cloud. The new enSpeak layer takes that translated text and turns it into speech that sounds natural and expressive, not robotic. Audiences can toggle between text and audio translations on their phones or see everything displayed on screens throughout the venue.
Ken Frommert, who leads ENCO, frames the problem plainly: modern audiences are increasingly multilingual, and organizations want to serve them without the expense and logistical headache of hiring professional translators for every event. "enSpeak gives AV integrators and end users an on-premises solution," he said, emphasizing that companies can run this technology on their own infrastructure rather than relying entirely on cloud services.
The flexibility matters. A classroom can use enSpeak to deliver instruction in multiple languages simultaneously. A corporate town hall can reach employees across different regions. A conference center can serve international attendees. Transportation hubs, courtrooms, houses of worship, museums, and government facilities all represent potential use cases—anywhere that clear communication across language barriers matters. The system can work as part of ENCO's larger ecosystem or standalone, converting one language to another without the surrounding infrastructure.
At InfoComm, ENCO will demonstrate how enSpeak integrates with enTranslate Mobile, the smartphone app that lets users switch between text and voice translations on the fly. The company is positioning this as a complete solution: live captions, real-time text translation, and now spoken language translation, all woven together into what they're calling a seamless workflow for modern AV environments.
What's notable is the timing. As organizations increasingly serve diverse audiences and as AI speech synthesis has matured enough to sound genuinely natural rather than uncanny, the economics of real-time translation are shifting. The high cost and scheduling burden of human translators—the friction that has kept multilingual communication a premium service—may be about to change. If enSpeak works as advertised, accessibility and language inclusion could become standard features rather than special accommodations.
Citações Notáveis
Organizations are looking for better ways to make communication more accessible and engaging for everyone in the room, while avoiding the high costs and scheduling burden of human translators.— Ken Frommert, President, ENCO
A Conversa do Hearth Outra perspectiva sobre a história
What problem is this actually solving that didn't exist five years ago?
The problem existed, but the technology wasn't ready. You could caption live speech, you could translate text, but converting translated text into natural-sounding speech in real time—that required AI to get good enough that it didn't sound like a robot reading a grocery list. Now it does.
So it's not just translation. It's translation that sounds human.
Exactly. A presenter in English, and someone in the audience hears natural Spanish coming through their earpiece. Not word-for-word, not stilted. Actual speech.
Who benefits most from this?
Anyone running events with mixed-language audiences who can't afford professional translators. Schools with immigrant students. International conferences. Corporate offices with distributed teams. But also smaller venues—a museum, a town hall, a courtroom—places that never had the budget for translation before.
What's the catch?
The technology has to be good enough that people actually trust it. And it has to integrate smoothly with existing AV systems without breaking workflows. ENCO is betting they've solved both.
Is this the end of human translators?
No. High-stakes situations—legal proceedings, diplomatic negotiations, sensitive medical conversations—will still need humans. But for the 80 percent of communication that doesn't require that level of precision? This changes the equation entirely.