Captioning: What Venues Need to Know

(Image credit: Future)

So, you’re designing a new venue, intended to be state-of-the-art in every way (or perhaps upgrading an existing facility). Have you thought about the accessibility of your media? Moreover, have you thought about captions and translations?

No longer are captions necessary for only deaf or hard of hearing people. Considering people whose primary language may not be what’s being spoken on a PA system or in a video, captions can help them better understand what is said. Approximately 48 million Americans have a degree of hearing loss, whether from exposure to sustained loud noises or music, tinnitus, or other causes—your venue’s events will want to ensure everyone can enjoy why they are there.

There is much to consider when designing a media production and distribution system that fully supports all possible variations of captions associated with different forms of live and pre-recorded content.

Captions are so important that some have called them the “third rail” of content, following video and audio. In a venue, you can easily have a spectator wearing earbuds to listen to a play-by-play on the radio and not hear the PA announcements. They can still appreciate what is being said on the PA when it appears as captions on a video display in the venue. Expect this level of “multitasking” to increase.

Increasing numbers of municipal governments of all sizes are also passing legislation requiring captions to be shown for any video content on screens larger than a small desktop monitor—and some want the PA system announcements to be captioned, too. (Imagine the life safety implications if deaf attendees can’t hear or read essential announcements.) The Americans with Disabilities Act (ADA) also has a major impact on when captions are available.

Technology Choices

Most captioning still relies on the same stenographer keyboard, process, and technology that originated in courtrooms more than 100 years ago. However, the technology behind captions has evolved considerably, with AI–based automatic captioning systems growing in popularity.

[Helping the Hard of Hearing]

The process of delivering captions has also evolved, from the manual insertion of lower-third open captions on edited videos in 1972 by PBS (where each line was manually typed and keyed in post-production), to the clever analog video hack of sending caption data through vertical blanking interval (VBI) line 21 to caption live TV, to embedding in SDI and IP transmission paths today.

Captioning’s usage has evolved in many ways, too, including augmenting live audio announcements via digital signage, generating meeting and event transcripts, enabling search engines to find words in massive audio and video archives quickly, or delivering live captions to smartphones, tablets, and even websites. However, as with any technology, one broken link in a signal chain or workflow process can completely break an otherwise tight captioning ecosystem.

Venue AV designers, sports presentation teams, and outside broadcasters need to be aware of captions from a variety of sources and destinations—and as with so many technical elements, there are different formats and standards to interface with along the way. There are several aspects to consider, including infrastructure requirements, captioning for live feeds vs. recorded content, and choosing between a remote captioning service or an on-premise AI captioning solution.

ENCO Systems, 2022 NAB Show — ENCO captioning brought the "third rail" of content to life at the 2022 NAB Show. (Image credit: ENCO Systems)

Are the cabling, routers/switchers, DAs, video monitors, signage displays, and related systems all able to see, decode, or pass-through closed captions from video or caption data for direct display? There is a lot to unpack there. Newer systems, for example, tend to have firmware that can accept and process closed captions (also known as embedded CEA 608/708 captions) or the data embedded in SDI’s VANC field.

But if you have a router that only passes SDI audio and video but no VANC data, that is a blocker. HDMI can be another block—that specification does not support closed captions, so it's another challenge (the devices that decode the IP streams or over-the-air (OTA) signals are what can render the captions onto the video). Legacy coax MATV distribution systems and modern IP video distribution systems also need to pass along embedded caption data, and, of course, video display endpoints need to be properly equipped (with captions enabled). Same with any streaming encoders; some may not support captions now but are just a firmware update away from doing so, so you need to check the documentation.

New transport protocols such as ST-2110 aim to support embedded closed captions from the start, which will help with those more advanced buildouts (it is crucial to check the technical specs before making the switch.) Some venues dedicate particular LED ribbons or video displays for captions of the audio coming through the PA all the time (historically called CART: Communication Access Realtime Translation). Other sites key that caption data feed onto a section of their center-hung at least, with the same benefit. Some display engines can accept a caption data feed directly, which can help give a designer more options on where to display them.

Live and File-Based Content

For file-based content delivered to the venue, what if it's not captioned? To keep ahead of ADA compliance regulations, you’ll likely need to either send it out to a traditional captioning service for human transcription or run it through an automated, on-prem AI-based captioning platform. Human transcription can take a while and be costly; captioned "sidecar" text files from an AI system take about half the time of the file’s length to produce, but those captions will have to be merged back into the video via desktop editor or during playout from a media management system.

Whether an inbound live feed or something originating in the venue, captioning live content is a very different process than with file-based material. To caption everything heard over a PA system, most legacy workflows require sending that audio to a stenographer outside the venue, who types the captions manually and sends them back to the venue as a data feed for deployment. An AI-based, on-prem captioning system is simpler, faster, more reliable, and very accurate.

[Here's How ENCO Boosts Productivity for U.S. Department of Veterans Affairs]

There is much to consider when designing a media production and distribution system that fully supports all possible variations of captions associated with different forms of live and pre-recorded content. What gets even more interesting is translation, which converts one language of captioned text into another, making the translated words available to the audience. Many cloud-based options exist to do this with offline (file-based) projects, but are often more technically challenging if you want to do live, real-time translations across different languages. Thankfully, AI technology helps with this, too, and for some languages you can even do this entirely on-prem.

As people continue to age, hearing loss increases, and multicultural/multilingual events grow in popularity, it is essential to help everyone understand the spoken words they may not be able to hear in their native language. With some homework, you can design and build an entire technical plant and production workflow that embraces these factors now and into the future.

Bill Bennett is a media solutions and accounts manager for ENCO Systems.