Do You See What I Hear?

(Image credit: Future)

Our attention is sought after so commonly in the human experience, and staying focused on any one activity or task can be challenging. It may be that we’re attending a concert, worship service, or business convention (events that we go to intentionally) and are intent on paying attention. But the audio system fails to promote engagement with the content—or worse, it distracts us or gets in the way of our focus. There has to be a better way.

[SCN Installation Showcase 2022]

How does an audio system detract from the listening experience? I’ll start with the obvious: Speakers generally aren’t pretty. They’re often large obtrusive boxes on sticks. Fixing this problem is on the accessible side of removing distractions: using loudspeakers that are visually pleasing and blend in with the architecture.

The more challenging part is increasing listener engagement after making the loudspeaker visually unobtrusive. Our brains are stimulated via sight and hearing during a concert, service, or convention. Authentic engagement can be increased with less effort when what you hear lines up with what you see. The goal is to place the primary audio source so that the sound seems to come from where it looks like it’s supposed to come from.

Church Audio — Churches can be particularly challenging for audio design. (Image credit: Evan Landry)

For example, if the person speaking is center stage and the loudspeakers are on the sides, the listener's attention is being pulled in two different directions: visually straight ahead and aurally to the side. This increases cognitive load, as the brain is required to process information from two different places—and the increase in cognitive load decreases engagement.

[Sing It Loud: d&b Soundscape Brings Audio Clarity to California Church]

Ideally, the loudspeaker is placed near the action. This approach is known as co-locating the audio source (loudspeaker) with the visual source (the person speaking or video display), which significantly improves understanding, enhances comprehension, reduces cognitive load, and lowers the barrier to active listening.

True Engagement

I once had the pleasure of attending a worship service at a cathedral in Utah. The audio system had just been replaced, and speech reproduction was precise. I arrived a few minutes late and sat in the back row. The audio system installer had done a great job—the main speaker was visually unobtrusive and color-matched to its surroundings. The speech was unmistakable.

But there was one glaring oversight. While the podium and gooseneck mic were installed for lectors on the right side of the church, the main line array speaker was installed on the left. The origin of sight and sound were separated by more than 40 feet.

My face hit my palm. I assumed they took this approach to avoid potential feedback caused by the speaker being picked up by the gooseneck microphone in front of it and creating a terrible squeal. (There are other ways to prevent this.)

Authentic engagement can be increased with less effort when what you hear lines up with what you see.

I stared at the line array speaker. I glanced at the podium, ornately painted and adorned with gilded wood carvings, and I attempted to engage with the message being delivered. I failed—and found my attention still being pulled to the left. I continued to stare at the main loudspeaker and wondered, “Why did they put it there?”

The co-location of the main loudspeaker with the message being delivered is called the acoustic anchor approach. The loudspeaker "anchors" your attention to the priest, professor, musician, or keynote speaker, and results in increased engagement from the audience.

Articulation Extension

What if the space is too large for just one loudspeaker? Large rooms are typically afflicted with a drop off in high-frequency energy, which causes the clarity and articulation to decrease as the distance from the speaker increases.

[SCN: The Audio Issue 2022]

This drop off in articulation in a large room can be mitigated by using additional smaller loudspeakers for articulation extension while enabling the listener to engage with and focus on the audio from the main loudspeaker. High frequencies (which account for the clarity and intelligibility of speech) will drop off before any other part of the signal, and additional loudspeakers will need to be supported.

To avoid interfering with the listener’s attention to the acoustic anchor, the articulation extension of an audio system (adding smaller support speakers with shorter distances between them) can be accomplished with the help of three audio processing tools:

• Frequency shading: only allow the frequencies lost over acoustic attenuation to pass through to the support speakers.

• Amplitude shading: only provide enough volume to "support" the acoustic signal from the main acoustic anchor line array. We are not making it louder than it would have been, as doing so would draw the listener’s attention to the support speaker instead of the acoustic anchor.

• Delay: each support speaker is delayed by approximately 1 ms per foot of distance, so the sound coming out of each support speaker lines up with the sound from the main line array. Without delay for time-alignment of supporting loudspeakers, clarity and articulation take a sharp decline.

In a large room—whether a theater, auditorium, or cathedral—the sound will begin to attenuate as the listener gets farther from the sound source, which would require supporting speakers (aka delays, a term that I tend to avoid because it doesn't give the full picture). The approach here is that we’re only replacing what has been lost by acoustic attenuation over the distance traveled within the space. Allowing most of the frequencies to be reproduced by the acoustic anchor, and only extending the high frequencies, will increase the listener’s attention in the direction of the acoustic anchor.

Transient Response

Clarity and speech articulation are highly reliant on the audio system’s ability to effectively and efficiently reproduce the transition markers of speech, such as "t" in "toy" or "b" in "boy." This is because 50% of the intelligibility of speech is attributed to only 2% of its energy. To do this and increase clarity and intelligibility, we need a high-frequency driver (tweeter) in our loudspeaker’s construction with a better transient response. The best way to do this is by utilizing a ribbon tweeter or AMT (air motion transformer) to reproduce high frequencies.

A tweeter with a better transient response can oscillate faster than its competitors. The low mass of a ribbon tweeter or AMT, when compared to a typical loudspeaker’s soft-dome tweeter, enables it to oscillate at a faster rate, which improves the loudspeaker's ability to reproduce transition markers in speech and provides unmatched clarity and articulation.

The acoustic anchor approach to audio system design includes a line array loudspeaker co-located with the visual source. This provides increased listener engagement and decreased effort required from the listener, reducing cognitive load. From there, we add supporting loudspeakers for larger spaces and process their signal with frequency shading, amplitude shading, and delay. Finally, choosing a loudspeaker with superior transient response will provide an adequate rendering of transition markers, yielding unmatched clarity and articulation of speech.

Evan Landry is the CTO for CommLink Integration.