While I was at a dinner not long ago, someone asked what I do for money. I thought I did a pretty great job of describing the Pro AV industry and the work Alliance for IP Media Solutions (opens in new tab) (AIMS) has been doing to develop and promote the IPMX (opens in new tab) open standard for AV over IP. But when I was done answering it was clear that I failed because the next question was, “So how is that different from videos on YouTube?”
That was a humbling response mainly owing to a lack of a quick answer. How is the IP audio and video that we work with different from the video systems we use every day, over the internet? Is it quality? Is it that we use it at work? Is it that our systems are typically installed by professionals? These seemed like answers that would become irrelevant or obsolete by the time our dessert came.
We need video to achieve subframe latency, even when we don’t happen to have a PTP grandmaster. We need to plug in our laptops, even though they don’t can’t handle reference timing.
There are many things that separate the Pro AV industry from the wider world of media on the web, but after thinking on it, one thing that stands out to me is time. Unlike the web, which deals with content over the internet, people in the broadcast and Pro AV worlds concern themselves with content as it changes over time, which may also happen to traverse the internet.
Whereas the web measures latency in seconds, we need milliseconds to measure the acceptable delay between a video source and the display. When we blow a 4K image up to the size of a five-story building, slight variations in latency can cause tearing between tiles that will look ugly at a massive scale. When an audience hears auditorium speakers deliver a live performance, that audio must be aligned to mere nanoseconds for each of those speakers or it will sound terrible.
In broadcast with SMPTE ST 2110 (opens in new tab), every source is synchronous and locked to a PTP grandmaster so that the system’s latency can be reasoned about and minimized. These live productions require that every camera is showing the same slice of reality at precisely the same time so that production switchers can mix sources seamlessly. Also, these constraints make it possible for a director to see results in real-time as they direct with their production switcher, without needing to mentally accommodate for delay as they switch cameras to the beat of the music or the flow of a conversation.
The goal of IPMX is to deliver the performance that the Pro AV world needs for any application of audio and video, including live production and presentation workflows, and to do it in the real world where training, equipment, and budgets are not always optimized for success. We need video to achieve subframe latency, even when we don’t happen to have a PTP grandmaster. We need to plug in our laptops, even though they don’t can’t handle reference timing.
To bring these requirements to reality, IPMX includes additional specifications to support a comprehensive range of system timing scenarios:
1: Synchronous sources locked with PTP: Subframe latency, seamless switching, extra buffering and processing at the source if the device cannot synchronize itself to an external clock. This also describes SMPTE ST 2110.
2: Asynchronous sources with PTP timing as a reference: Subframe latency, no seamless switching, but devices can understand when each frame appeared in time relative to each other so that they can be re-aligned at the receiver.
3: Asynchronous sources with no PTP timing: Subframe latency and quickly recover when switching between sources, but no alignment is possible.
How does IPMX accomplish such magic? By establishing rules for devices and providing the system’s controller with what it needs to reason about the latency between senders and receivers. In IPMX, each sender (source of content in the system) transmits an RTCP sender report (a control packet for RTP streams) for every frame of video and for every 10 milliseconds of audio. This report contains the relationship between the media clock, which always follows the precise rate of the content, and a second reference clock. If PTP is available, then each device must use it as the second clock, regardless of whether or not the video can be synchronized. Otherwise, the RTCP packets reference a clock that is internal to the sending device.
When media is synchronized with PTP, the media and PTP reference clocks will progress at the same rate. When they are not, the receiver can use this information to reason about when different frames from different sources were generated. When there is no PTP, nothing can be said about alignment, but receivers can still use this information to quickly regenerate the clock at the receiver, shortening recovery time at the display when sources are switched.
As a result, subframe latency can be achieved without a PTP clock and synchronous content can be mixed with asynchronous content, provided the system’s devices can handle the buffering and processing required for re-alignment. In short, expensive, cheap, well-designed, and ad-hoc AV systems can each operate at their peak potential regardless of if they are producing content or presenting it.
Making something simple and flexible is hard to do, but the engineers working on IPMX have done just that. They’ve made a single compatibility technology capable of supporting AV transport for a wide range of devices and applications so that we’ll finally have an AV over IP open standard capable of replacing the old-school baseband transport that has defined our industry since the beginning.
And that is definitely something to chew on.
IPMX's Soft Spot (opens in new tab)
IPMX: Media Experience Nirvana? (opens in new tab)
IPMX: The Next Great AV Standard? (opens in new tab)