The ABCs of AVoIP, Part 1

AVoIP Graphic
(Image credit: Getty Images)

Part 1

(Image credit: Future)

It may be hard to believe, but it’s been almost a quarter-century since the AV industry (along with broadcast and TV/film production) began thinking about routing audio, video, and control signals over the internet. At the NAB 1999 show, there was even an entire section of the Las Vegas Convention Center devoted to “streaming media.”

While those baby steps—remember the “dancing postage stamps”—failed to materialize into anything immediately practical and useful, they did show where these industries were headed. Today, AV signal distribution over fast IT networks is a matter of fact. Numerous improvements in video codecs, broadband speeds, and the widespread acceptance of video streaming to the home also helped spur this ongoing transition.

The phrase “AV-over-IP” has become ubiquitous, but it’s not entirely accurate. “AV-over-IT” would be a better choice, as “IP” stands for a protocol, whereas “IT” is a type of network. In fact, many people remain a bit confused about how audio and video transport over a network actually works, and why you’d want to use that method of signal distribution, as opposed to more traditional methods like HDMI or HDBaseT.

[AVoIP or IPTV?]

To help clear things up, SCN proudly presents this primer on AVoIP. In Part 1, we'll cover how AVoIP works, how it differs from traditional signal distribution methods, and what codecs do. We've even included a healthy dose of acronyms to help you keep track of the terminology. Read on!

Not Really Real Time

The first thing to understand about IT networks is that, at their heart, they are not real-time transport systems. Data sent over IT networks is broken into smaller packets that can travel through multiple servers before they arrive at their destination and are re-assembled into documents, spreadsheets, photos, and emails.

That’s a big no-no for video and audio streams, which must travel in sequenced packets and arrive in the same order in which they started (and in sync). This non-linearity created all kinds of problems with the crude attempts at streaming we saw back in the late 1990s. It’s also why manufacturers of AV gear stayed clear of IT transport, preferring to use hard-wired connections to move audio and video around.


Solutions like NETGEAR switches (pictured) help simplify AV-over-IP transport. (Image credit: Netgear)

The second thing to understand is that IT networks are shared bandwidth structures. The network bandwidth is fixed—for example, 10 gigabits per second (Gbps)—and can’t be increased. If only a few users are present, that’s not a problem. But if more and more users are logged on and sending/streaming large files, they can and will slow down the network.

Think of a hotel with 200 rooms and one hot water system. If a few people are taking showers, there’s plenty of hot water for everyone. But if all 200 guests jump into the shower at the same time, the hot water will run out pretty quickly!

How then can we expect to send video and audio packets from point A to point B and ensure both arrive intact and in sync? We do this by employing two things in our toolbox: a set of protocols attached to each packet, and a (mostly) lossless method of compressing and decompressing the signals to minimize their bandwidth, speeding their travel through the network.

Internet Protocols

Internet traffic travels in packets (think of envelopes) that have headers known as protocols, which specify the type of packets and how they are to be handled. The two most widely used protocols are Transport Control Protocol (TCP) and Internet Protocol (IP). Everything that travels over the Internet carries these two headers (TCP/IP).

[Viewpoint: Pioneering the Future of AVoIP]

Internet packets have uniform sizes of 1,500 bytes, based on IEEE standards. And they have a destination, or IP address, written in octets (for example, All these elements ensure that packets can travel through a network or networks and arrive (not always in order) at the intended destination. If a packet is dropped, the receiving server can request it be sent again and again until all the packets are received and the original file is reassembled.

For the majority of files sent across IT networks, this system works quite well, assuming real-time reception isn’t required. But to handle video and audio, we need to add a few more protocols (instructions on the envelopes) that specify the precise order of packets and what to do with them.

Internet traffic travels in packets (think of envelopes) that have headers known as protocols, which specify the type of packets and how they are to be handled.

A common protocol for media is User Datagram Protocol (UDP). We can also add another protocol for more detailed instructions on those envelopes, such as Real Time Messaging Protocol (RTMP) or Real Time Streaming Protocol (RTSP). Yet another protocol, Integrated Group Management Protocol (IGMP), is used for servers that generate individual video streams for each viewer logged into that server. Without these protocols, our video and audio files would arrive as a useless jumble of jigsaw pieces—with no instructions on how to reassemble them.

Codec Choices

Where do all these packet headers come from? They’re generated in a codec, a software/hardware system that analyzes the video and audio and breaks it into compressed, smaller packets. This process is extremely fast and requires a ton of computing power and speed. (Fortunately, both are inexpensive and abundant.)

We simply connect a video source—say, an HDMI connection—and play the file. The codec ingests video and audio, performs a ton of mathematical calculations, and out comes a stream of packets. Sounds easy, right? Admittedly, that description is a bit simplistic. There are several codecs available to us, but selecting the right codec depends on your application and a quality of service (QoS) choice you need to make at the start.

Codecs for video can be separated into two categories. The first type is very efficient traveling over networks, but introduces latency, which is a delay between when a video frame is played back and when it’s eventually seen at the receiving end. The second type of codec isn’t nearly as efficient in using available bandwidth but exhibits very low latency.

[The SCN Integration Guide to AVoIP 2023]

Your QoS decision determines what’s more important: efficient transport over networks to conserve bandwidth or near real-time delivery of high-quality video and audio, regardless of bandwidth. In Part 2, we'll discuss both types of codecs, plus codec enhancements, Wi-Fi streaming, and more.

Pete Putman

Pete Putman, CTS, KT2B, is the president of ROAM Consulting.