Byte-Sized Lesson in AV/IP: Why ABR Video Has Latency

Because delivery is based on TCP, each chunk is sent as aggressively as congestion and network load will allow. So, the chunks arrive in a bursty manner. (Image credit: GETTY IMAGES/JOHN M LUND PHOTOGRAPHY INC)

Our industry seems to be constantly talking about latency. The SDVoE proponents claim latency that is less than one frame time, glass to glass. The SRT Alliance members report latency under 300ms. At the same time, users of adaptive bit rate (ABR) video typically experience latency of several seconds. We ought to ask, “Why can’t ABR video vendors reduce their latency?”

The answer is the consequence of several factors:

ABR uses HTTP/TCP.
ABR video is delivered in chunks.
The internet routers have very large buffers that reduce throughput but increase latency.

Since ABR uses HTTP and TCP it can add several hundred milliseconds of latency. A TCP packet is usually called a segment in the IT industry. When such a segment in a stream is dropped by the network, the sender doesn’t retransmit it until the receiver requests that segment four times. If these requests for retransmission are in the back of the queue in large buffers, the retransmission can take several hundred milliseconds.

Because ABR video is delivered in chunks, it is really a series of small file transfers. Because delivery is based on TCP, each chunk is sent as aggressively as congestion and network load will allow. So, the chunks arrive in a bursty manner. This requires the receiver to have a buffer that can hold multiple chunks. If it didn’t do that, there might be periods in which the buffer emptied and there would be nothing to play out.

A chunk is usually 4 or 10 seconds of video. Let’s say the chunk is to play for 10 seconds at 4Mb/sec. Such a chunk is 40,000,000 bits or 5,000,000 bytes. IP video segments very often have about 1,400 bytes in them. So, the chunk is transmitted in about 3,600 segments (packets). However, modern internet routers often have buffers that can hold more than a chunk. So, if the retransmitted packet is in the back of an overloaded buffer, the receiver would need to deal with lost data or have quite a lot of video in the receive buffer. That’s the trade-off. The receiver needs to have a large buffer that introduces high latency or must occasionally play video in a degraded state.

These problems introduced by using TCP are the primary reason for the development of competing solutions like Google’s QUIC (Quick UDP Internet Connections) and SRT (Secure Reliable Transport). Both methods capture some of the positive attributes of HTTP/TCP transport but avoid the undesirable consequences. There has been a significant amount of research on the relationship between large network buffers and TCP’s problems. However, that same analysis has not been published about these two newer protocols. When that testing is done, we may find that they also have problems dealing with large buffers.

Phil Hippensteel