Does the Future of AI Depend on Audio?

In the last year, it seems like the interfaces that have been getting the most attention are “conversational interfaces” that use voice recognition, text-to-speech, and AI engines. The Amazon Echo suite of products is now one of the most popular speakers in the world, and everyone is following suit. There are few audio products on the drawing boards that aren’t considering adding some Alexa-type voice interaction. What are the implications of using voice and audio for the future of audiovisual systems interaction, and what design considerations and trends are we seeing that might affect the use of this popular technology in professional audiovisual applications?

Conversational Systems

Because these systems must be constantly listening, spaces must have microphones embedded everywhere, standing by to capture the commands of those inhabiting a space. If you have to move to a special location where there is a microphone, it is not a very natural experience. There are several implications for the future of this technology in the built environment including microphone system design and security. Any space that has a complex conference audio system may adapt well to voice commands, but spaces with simple audio conferencing or no audio conferencing at all may be challenging to adapt. There is also the primary consideration of how to ensure the security of a system that listens to you wherever you go. Constant audio “surveillance” (if you will) for “wake words” such as “OK Google” or “Hey Siri” will make normal conversations vulnerable. Could this concern be enough of an impediment to prevent the adoption of this trend into enterprise installations in the future?

Smart Headphones

In the long term, headphones may be a better option for implementing conversational systems. Headphones deliver clear audio, can cancel some noise, listen to you speak, listen to your environment, and even take your vital signs. The intimate relationship that we’re developing with our headphones has come into focus in the last few years. Form factors such as the Apple Airpod and features such as ambient sound mixing and heart rate monitoring (which I wrote about here back in 2014) are going to change our relationship to wearing headphones, and possibly lead to a time when headphones are as essential to our daily wardrobe as glasses or shoes—both of which were also interface technologies and innovations of past generations.

3D Sound

Spatial sound isn’t new. We’ve been experiencing surround sound for decades in movie theaters, but the rise of VR has given users more opportunity to become familiar with responsive 3D sound experiences. As more people have the opportunity experience this, demand will go up in all environments, including corporate and educational spaces. What does this mean for AV? It may mean that we will need better processing tools to allow for implementation of interactive spatial audio in even the most common places. Demand for implementing 3D sound in conventional spaces with simple ceiling speakers, for example, will mean that we will need tools to map 3D sound to arbitrary microphones and loudspeaker arrangements, as well as decoders that can support such ad hoc implementations. Imagine being able to reproduce the sound and locations of people in the far end with spatial accuracy in your room.

Algorithmic Acoustical Design

I can’t talk about the future of acoustics without recognizing one of the giants of the science that we lost in 2016. Leo Beranek virtually created the profession of acoustical consulting when he and his partners started Bolt Beranek and Newman (BBN) in 1948. His biography, Riding The Waves is recommended reading. BBN also had a major effect on the development of computing that is being leveraged in today’s acoustical practice in many ways. Recently, algorithms helped to design the Elbe Philharmonic concert hall in Hamburg, Germany. Architects Herzog and De Meuron collaborated with renowned acoustician Yasuhisa Toyota and One to One fabricators to develop an acoustical treatment that may be the future of acoustical design. As reported recently in Wired, Elbe Philharmonic’s designers employed an algorithm to design the reflective panels and give each panel a unique reflective quality. Rear room panels are very different from ceiling panels and no two panels are alike. With a combination of 3D design tools like Revit combined with machine learning, we may be approaching a time when architects will know if a space is going to work acoustically, immediately after they create a virtual space. Great design may always require a human eye and ear, but the gross acoustical design errors that we experience every day may soon be a thing of the past. If we are going to rely on audio systems in the future for conversing with our building systems, acoustics will be an essential part of making this technology work.

Paul Chavez (pchavez@harman.com) is the director of systems applications for Harman Pro Group. He is a usability evangelist and a futurologist. Chavez has designed a variety of audiovisual systems ranging from themed attractions to super yachts. He has also taught and written on the topics of interaction design, audiovisual design, and networking.

The Challenges of Audio

There are good reasons why audio can be such a challenge to get right. Here are a few of them:

Frequency Response: We all know our ears can hear from 20Hz to 20,000 cycles per second (at least when you’re an infant). But what we don’t always comprehend is the extent of the problem of reproducing accurately a frequency range of three orders of magnitude—and that is only a single component of a complex soundwave.

Dynamic Range: When we talk about the range of our hearing sensitivity we often talk in dB SPL, but consider that this is a logarithmic translation of air pressure. When considered on a linear scale our ears are sensitive to a range of pressure from 0.020 pascals (Pa) to the threshold of pain between 100 and 140 Pa, nearly four orders of magnitude.

Capture and Reproduction: This is the most difficult audio challenge of all: to capture a three-dimensional wave that exists in space, interacting with every surface in its proximity—and then reproduce it (often in an entirely different environment) in three dimensions is an enormous challenge. Stereo capture and reproduction seems primitive except, possibly, when considering capturing sounds in the ear canal and reproducing them in the same. The promise of binaural recording and reproduction is yet to gain wide adoption, but in a future world of nearly permanent headphones, it may be a very good way to capture and recreate sound.