1. Describing ASCII text files

ASCII.zip
This archive contains the ASCII text file example which is discussed in the user manual.

2. Describing bitstreams compliant with H.264/MPEG-4 AVC

H264_AVC.zip
An example of using gBFlavor to describe bitstreams compliant with H.264/MPEG-4 AVC is located in this archive.

2.1 Description of the high-level structure of an H.264/MPEG-4 AVC-compliant bitstream

Our example gBFlavor code contains the high-level syntax structure of the first version of H.264/MPEG-4 AVC. Syntax elements up to and including the slice header are present. The high-level structure of the H.264/MPEG-4 AVC coding format is illustrated in the following figure:

An H.264/MPEG-4 AVC-compliant bitstream consists of a sequence of Network Abstract Layer Units (NAL Units or NALUs). Examples of NALUs are Sequence Parameter Set (SPS), Picture Parameter Set (PPS), Supplemental Enhancement Information (SEI), and slice. The SPS and PPS NALUs contain information for the correct decoding of a sequence of pictures. A SEI nalu contains information that is not needed by the decoding process; it can be seen as embedded metadata. Pictures consist of one or more slices. Each slice contains a slice header and slice data.

2.2 Exploitation of temporal scalability in H.264/MPEG-4 AVC

Exploitation of temporal scalability in video codecs is generally achieved by dropping B pictures. Discarding B pictures does not affect the decoding of other pictures since B pictures are not used as reference pictures.
However, the H.264/MPEG-4 AVC specification allows an encoder to use B slices as reference for other slices. Therefore, simply discarding B slices will cause problems for the decoding process. These problems can be solved by using hierarchical B pictures.

2.2.1 Hierarchical B-pictures and sub-sequences

H.264/MPEG-4 AVC enables the use of sub-sequences to provide support for the exploition of temporal scalability. A sub-sequence is represents a number of inter-dependent pictures that can be disposed without affecting the decoding of any other sub-sequence in the same sub-sequence layer or any sub-sequence in any lower sub-sequence layer.It is hereby possible to assign coded pictures in a bitstream to sub-sequences and sub-sequence layers in multiple ways, provided that the structure fulfills the requirements for dependencies between sub-sequences. Typically, each picture will belong to exactly one subsequence, and each sub-sequence will belong to exactly one sub-sequence layer in any sub-sequence structure. In short, a sub-sequence layer contains a subset of the coded pictures in a sequence while a sub-sequence is a set of coded pictures within a sub-sequence layer.

Encoding a bitstream with hierarchical B-pictures can be combined with the concept of sub-sequences. An example of such a coding structure is provided in the following figure:

Hierarchical B pictures introduces different temporal layers within an H.264/MPEG-4 AVC-bitstream. Note that each temporal layer corresponds to a sub-sequence layer. As illustrated in the figure, this coding structure enables the scaling of the temporal resolution by a factor two (when one layer is discarded). An important remark is that the scalable extension of H.264/AVC (i.e., H.264/AVC Scalable Video Coding (SVC)) also uses hierarchical B-pictures to implement the temporal scalability axis.

2.2.2 Sub-sequence information embedded in SEI messages

The value of the frame_num syntax element (which is located in the slice header) can be used to calculate in which layer the particular slice is located. Note that this approach is used in our example gBFlavor code. However, H.264/MPEG-4 AVC provides a SEI NALU which contains sub-sequence information. For instance, this information contains the number of the sub-sequence layer and the sub-sequence the slice is located in. Hence, adaptation tools can discard temporal layers by using the information provided by the sub-sequence information SEI NALU. Note that discarding B slices based on the sub-sequence information is much more efficient than the calculation based on the frame_num syntax element. This is because the level of detail in the parsing process is much smaller in case of the sub-sequence information (i.e., NALU header level in case of the sub-sequence information, slice header level in case of the frame_num syntax element).

More information on the exploitation of temporal scalability in H.264/MPEG-4 AVC can be found in this publication.

BFlavor

Samples