Videoconferencing and telepresence systems continue to make advances in recreating the sense of “being there,” but, for the most part, these advances have been limited to the visual sense -- improved eye-contact, high definition resolution, life-sized proportioned images, and PTZ cameras. Some improvements in the audio realm have included wide-band audio, like Polycom’s Siren 22 algorithm, which claims a frequency response out to 22 kHz. Acoustic echo cancellation technologies, which are critical to any high-quality conferencing experience, have improved to the point where talkers at both the near and far beyond can easily speak simultaneously without the disturbing effects of loudspeaker leakage into microphones that plagued early conferencing systems. Even with such improvements, the audio conferencing still lacks that certain “something” that prevents the illusion of being there from being completely convincing. Bill Gardner, president of Boston, MA-based Wave Arts, thinks that what’s missing might be the sense of a realistic three-dimensional acoustic environment.
Wave Arts is developing a system called vSpace, which is software for audio teleconferencing over the internet. It combines voice over internet protocol (VoIP) with proprietary 3D audio processing to bring fluid, more life-like discourse to remote communications.
“We reproduce the way sound arrives at the ears of a human listener,” says Gardner. “A sound to the left of a human creates a louder left ear signal than right ear, just as in amplitude panning, but the sound at the right ear is also delayed by a fraction of a millisecond. In addition, the sound is diffracted by the head and external ear, which causes the tonal response to be altered. So unlike conventional stereo where position affects only the amplitude, we alter the amplitude, delay, and the equalization of the sound, just as in real life hearing. The result is a much more realistic positioning of the sound with the perception that the sound is outside the head.”
In its current version, vSpace users wear stereo headsets and connect to vSpace over the internet using a custom software phone. They hear other participants localized around their head as in a face-to-face conversation. The result: a new communications tool that resolves the problems of talker-confusion and turn-taking.
“Realistic localization of talkers in a video or teleconference requires much more that just being able to tell if the talker is positioned to the left or right in a conference room,” says Garden. “There’s that other element that defines a true 3D experience -- depth. A sense of depth is created by adding room reverberation and adjusting the level of the sound with respect to the level of the reverberation. Distant sounds are fainter and more reverberant, near sounds are louder with less reverberation.”
Another requirement of achieving a realistic sonic experience is what’s becoming known as “high definition audio” for audio conferencing, which extends frequency response out to at least 7 kHz, compared to the 3.5 kHz limits of standard telephony. Gardner says that the importance of using high definition audio is not only that it sounds better and more realistic, but also that speech is far more intelligible. “It's hard to tell the difference between, for example, the spoken words ‘found’ and ‘sound’ with standard telephony grade audio. We rely on our understanding and familiarity with the language to guide our word recognition.”
Currently, vSpace can be experienced using a headset, but the company says you can use a traditional speakerphone -- and Wave Arts is working on a “spatial speakerphone.” Wave Arts is planning to release a free vSpace service in second quarter of this year. The company says that the initial service will be internet only (no PSTN gateway for telephone dial-in).