Scalable Video Coding Demystified

Parsing the Finer Points of the Standard

Video applications over the public Internet and corporate networks are gaining popularity. These, and other applications, including visual communications, are based on the transmission of video to various recipients located in different locations, using different equipment and connected to different networks.

Video coding standards have attempted to address the challenges presented in these use-cases by introducing scalability features that allow recipients, as well as other network elements, to adapt video streams to different capabilities. To date, however, all commercial solutions with scalability features have not been successful.

Video applications, over the public Internet or corporate networks, have gained popularity in recent years. These and other applications, such as visual communications, surveillance and monitoring, telemedicine and eLearning, are based on the transmission of video to many recipients, located in different locations, using different equipment and connected to different networks.

Video coding standards have tried to address the inherent challenges of these use-cases by introducing scalability features, which allow recipients, as well as other network elements, to adapt the video stream according to their capabilities: CPU, display size, software support on the client side, available bandwidth, bandwidth variability on the network side and network conditions (packet loss).

  • THE SCALABILITY TYPES OF SVC
  • Scalability can be implemented on several different aspects of video coding, such as resolution (frame size), frame rate (frames per second), and quality (bit rate). Different scalability types handle each of the above aspects.

A video stream is considered “scalable” when parts of the stream can be dropped, via a process known as “layer thinning”, leaving a sub-stream which represents a different, yet equally valid, video stream.

A scalable video stream is comprised of a base layer, and a set of different enhancement layers, which—when combined with the base layer and other lower layers—yield a valid stream with different capabilities. These layers can be thinned, if needed. A stream that does not provide this property is referred to as a single-layer stream.

Spatial scalability allows sending a single video stream encapsulating several different resolutions. The stream may include various sub-streams, each representing a different resolution. For instance, a base layer may represent a qCIF (176x144) video, while enhancement layers may increase resolution to CIF (352x288) and 4CIF (704x576).

Temporal scalability allows sending a single video stream encapsulating several different frame rates. The stream may include various sub-streams, each representing a different frame rate. For instance, a base layer may implement a 7.5 frames per second (fps) video, while enhancement layers may increase frame rate to 15fps and 30fps.

Quality scalability, also known as bit-rate or signal- to-noise (SNR) scalability, allows sending a single video stream encapsulating several different quality levels.

Quality is defined as the bit rate necessary for encoding a given stream, assuming a higher bit rate means better quality. The stream may include various sub-streams, each representing a different quality level. For instance, a base layer may represent a 128Kbps (Kilo bits per second) video, while enhancement layers may increase bit rate to 256Kbps and 512Kbps.

The different scalability types can also be combined, so many representations with different spatial-temporal- quality levels can be supported within a single scalable bit stream.

THE PATH TO STANDARDIZATION
International video coding standards have played an important role in the success of digital video applications, past, and present. As scalability is key for successful mass deployment of any application, scalable video existed in all video coding standards—including MPEG-2 Video, H.263, and MPEG-4 Visual—in the form of scalable profiles.

However, providing spatial and quality scalability using these standards significantly added complexity and significantly reduced coding efficiency—defined as the bit rate needed to maintain a given quality level or the quality in a given bit rate—as compared to the corresponding non-scalable profiles.

These drawbacks, together with the lack of signaling and transport standardization related to these standards, practically made the scalable profiles unusable. Therefore, It become the focus of work done in the recent SVC amendment of the H.264/AVC standard, known as H.264/SVC.

THE H.264/SVC STANDARD
The H.264/MPEG-4 Advanced Video Coding (AVC) standard (known as H264/AVC) represents the state-of-the-art in video coding. Compared to prior video coding standards, it offers optimal coding efficiency, significantly reducing the necessary bit rate and offering higher quality at given bit rates. It also offers a very high level of implementation flexibility. H.264 SVC offers essential advantages that may help the standard succeed where its predecessors have failed before: in the area of mass commercial deployment.

These advantages include:
Low impact on coding efficiency compared to single-layer coding for each sub-stream of the scalable bit stream
Minimal increase in decoding complexity compared to singlelayer decoding (due to the introduction of single loop decoding)
Support for spatial, temporal, and quality scalability
Backward compatibility to H.264/AVC, as the H.264/ SVC base layer can be decoded by any standard H.264/ AVC decoder
Support for simple bit stream adaptations after the stream was encoded