1 Description-driven adaptation of (scalable) media resources
1.1 Overall approach
1.2 Benefits
2 Related work
2.1 XFlavor
2.2 BSDL
2.3 MPEG-21 gBS Schema
3 BFlavor
3.1 History
3.2 Functioning of a BFlavor-enabled media resource adaptation framework
3.3 Features of BFlavor
3.4 Performance results
3.4.1 MPEG-1 Video and H.264/AVC
3.4.2 H.264/AVC SVC
4 gBFlavor
1 Description-driven adaptation of (scalable) media resources
One way to realize a format-agnostic adaptation engine is to rely on automatically generated textual descriptions. These descriptions, further referred to as Bitstream Syntax Descriptions (BSDs), contain information about the high-level structure of a scalable bitstream.1.1 Overall approach
A description-driven adaptation process typically consists of three main steps, which are illustrated in Figure 1:- BSD generation: given a scalable bitstream, a BSD is generated containing information about the high-level structure of the bitstream (structural metadata). In particular, a BSD describes how the bitstream is organized in layers or packets of data. Note that a BSD is not meant to replace the original binary data; it rather acts as an additional layer, similar to metadata. Also, a BSD is typically expressed by making use of the eXtensible Markup Language (XML).
- BSD transformation: the actual adaptation process takes place during the BSD transformation. The BSD is transformed (e.g., by dropping layers or packets) according to the constraints of a given usage environment (e.g., available bandwidth). Two important possibilities exist for transforming XML documents. A first option is to use a procedural programming language in order to write a program consisting of an XML parser (e.g., based on Document Object Model (DOM) or Simple API for XML (SAX)) and additional transformation logic. A second option is to use a format-agnostic transformation engine that is able to interpret different transformation stylesheets. These stylesheets use a standardized (XML-based) language to describe the transformation logic. Examples of such standardized XML-based languages are eXtensible Stylesheet Language Transformations (XSLT) and Streaming Transformations for XML (STX).
- Bitstream generation: the last step is the generation of the adapted bitstream. The transformed BSD is used to steer the bitstream generation process. The adapted bitstream is then suited for playback in a given usage environment.
1.2 Benefits
Using BSDs for adapting scalable bitstreams enables the use of format-agnostic software within the adaptation framework. Indeed, the software operates on the BSD level, and is not aware of the underlying coding format. Thanks to the use of XML-based BSDs, many already existing tools for manipulating XML documents can be used, like editors and transformation engines. XML-based BSDs also allow the integration with other metadata standards, such as the MPEG-7 specification. Hence, description-driven media resource adaptation can be used to realize so-called 'intelligent' adaptations (e.g., automatic video summarization and scene selection).2 Related work
In recent years, a number of bitstream syntax description languages have been defined with a view to XML-driven manipulation of digital media resources, such as the Formal Language for Audio-Visual Object Representation, extended with XML features (XFlavor), the MPEG Video Markup Language (MPML), the Bitstream Syntax Description Language (BSDL), and MPEG-21 generic Bitstream Syntax Schema (MPEG-21 gBS Schema). MPML is an XML-based language that is specifically designed for describing the syntax of bitstreams compliant with MPEG-4 Visual. This language will not be discussed in further detail due to its very specific nature.2.1 XFlavor
Flavor provides a formal way to specify how data are laid out in a bitstream. It was initially designed as a declarative language with a C++-like syntax to describe the bitstream syntax on a bit-per-bit basis. Its aim is to simplify the development of software that processes audiovisual bitstreams by automatically generating the required C++ or Java code to parse the data, allowing a developer to concentrate on the processing part of the software. Unlike the BSD approach, a Flavor-based parser does not generate a persistent description of the parsed data, but only an in-memory representation in the form of a collection of C++ or Java class objects. As a result, XFlavor was developed to provide tools to generate an XML description of the bitstream syntax and for regenerating an (adapted) bitstream. Figure 2 summarizes the overall method for XML-driven media content manipulation by relying on XFlavor. Explanatory notes for Figure 2 are given below:- the syntax of a particular media format is described using Flavor;
- the description in Flavor is translated by the Flavorc engine into an XML Schema (for validation purposes) and a set of Java or C++ source classes;
- the source classes, together with a separate main() method, are compiled to a media format-specific parser;
- BSD creation by the format-specific parser, taking as input a particular bitstream;
- a BSD is transformed to meet the constraints of a certain usage environment;
- Bitgen creates an adapted bitstream, only guided by a transformed BSD.
2.2 BSDL
BSDL provides means for translating the syntax of a particular media resource into an XML description. The language is built on top of W3C XML Schema and falls under part 5 of the MPEG-B standard. The primary motivation behind the development of BSDL is to assist in the adaptation of scalable bitstreams such that the resulting bitstreams meet the constraints of a certain usage environment. The generic character of BSDL, and hence its merit, lies in the media format-independent nature of the different software modules that are responsible for the creation of the BSDs and for the generation of the adapted bitstreams. The BSD generator and bitstream generator are named BintoBSD Parser and BSDtoBin Parser, respectively. Figure 3 summarizes the overall method for adapting a scalable bitstream using BSDL, illustrating the removal of particular bidirectionally coded pictures to create a tailored bitstream suited for a constrained usage environment. Explanatory notes for this figure are provided below:- an MPEG-21 bitstream syntax schema (BS Schema) contains a description of the high-level syntax of a particular media format;
- a BSD is created by a format-independent BintoBSD Parser, taking as input a particular bitstream and a corresponding BS Schema;
- a BSD is transformed to meet the constraints of a usage environment;
- a format-independent BSDtoBin Parser creates an adapted bitstream, using the transformed BSD and the BS Schema, optionally taking the original bitstream as an additional input (denoted by the dashed arrow).
Note that with XFlavor, the complete bitstream data are actually embedded in the BSD, resulting in potentially verbose descriptions, while BSDL uses a specific datatype to point to a data range in the original bitstream when it is too verbose to be included in the BSD. This is why, unlike XFlavor, BSDL is a description language, rather than a representation language, and can describe the bitstream at a high syntactical level instead of at a low-level, bit-per-bit basis.
2.3 MPEG-21 gBS Schema
MPEG-21 gBS Schema is tool of Digital Item Adaptation (DIA), which is part 7 of the MPEG-21 Multimedia Framework. The MPEG-21 Multimedia Framework aims at realizing the 'big picture' in the multimedia production, delivery, and consumption chain. In contrast to BSDL and XFlavor, gBS Schema enables the creation of format-independent descriptions, which are called generic Bitstream Syntax Descriptions (gBSDs). The functioning of a gBS Schema-based adaptation framework is illustrated in Figure 4. The first step is the generation of a gBSD. This process is not described in the DIA specification, which implies that a gBSD may be generated in any proprietary way. The gBSD is subsequently transformed by using common XML transformation technologies such as XSLT or STX. After the transformation of the gBSD, an adapted bitstream is obtained using the gBSDtoBin parser. This parser takes as input the transformed gBSD and relies on the gBS Schema to steer the generation of the adapted bitstream. The gBS Schema is expressed by making use of W3C?s XML Schema Language and has the following properties:- it acts as an XML Schema for the gBSDs;
- it is independent of the underlying coding format, which implies that gBSDs are formatagnostic;
- it enables the semantically meaningful marking of syntactical elements;
- it provides support for hierarchical adaptations by describing hierarchies of syntactical units;
- it contains an enhanced addressing scheme to support efficient bitstream access.
The automatic generation of gBSDs is a well-known problem. Although gBS schema is generic (format-independent), the generation process is not since this requires specific software which is able to parse the format in question. Both this application-dependent and format-dependent aspect of the generation process for gBS Schema makes it difficult to create a format-agnostic parser able to produce application-specific gBSDs.
3 BFlavor
Both BSDL and XFlavor can be applied independently. However, both solutions are characterized by several complementary properties. In particular, the processing efficiency and flow-control flexibility provided by XFlavor, and the ability to create high-level BSDs using BSDL, were our key motives for the development of a harmonized BSD-based content adaptation framework.3.1 History
The initial version of the MPEG-21 BSDL specification is characterized by a fundamental performance problem pertaining to the automatic generation of BSDs for elementary video bitstreams. Indeed, real-life implementations of the format-agnostic BintoBSD process are required to keep the entire BSD in the system memory. This allows the evaluation of a set of arbitrary XPath 1.0 expressions in an at run time fashion, i.e. while parsing the bitstream and progressively generating its BSD. Consequently, this requirement results in an increasing memory usage and a decreasing processing speed for a BintoBSD Parser during the generation of an XML description for the high-level structure of a binary media resource.Several solutions can be proposed to improve the performance behavior of BSDL's BintoBSD process. We proposed BFlavor, an alternative to the use of BSDL's format-agnostic BintoBSD Parser (see Figure 5). Note that the developers of BSDL also worked on an improved version of BSDL. This resulted in an optimized BintoBSD parser (v1.3.1) which is characterized by a constant execution time and a constant memory usage. However, performance results will state that BFlavor still outperforms this optimized version of BSDL in terms of execution times.
3.2 Functioning of a BFlavor-enabled media resource adaptation framework
BFlavor allows the automatic generation of a set of source classes for a media format-specific parser, as well as the automatic generation of a BS Schema. The intended content adaptation architecture, bridging the gap between XFlavor and BSDL, is depicted in Figure 6. Explanatory notes are provided below:- the syntax of a particular media format is described using BFlavor;
- generation of a BS Schema and a set of Java source classes by the bflavorc translator (the modified flavorc translator), constituting the core of a parser able to produce XML output that is in line with the generated BS Schema;
- the generated parser (obtained in step 2) is used for the generation of a BSD, given a (scalable) media resource;
- further processing of the BSD by a BSD transformation engine;
- a standard BSDtoBin Parser uses the automatically created BS Schema and the transformed BSD for the generation of a tailored media resource.
Note that a BS Schema is not needed by a BFlavor-based parser to generate a BSD. This is due to the format-specific nature of the parser. A BS Schema is only required by BSDtoBin to create an adapted bitstream. This allows the automatic creation of BS Schemata that contain a minimal amount of information such that BSDtoBin can convert each element value in a BSD to a bit-level representation. Such functionality can already be provided by an XML Schema using BSDL-1 datatypes, as BSDL-2 is specific for BintoBSD and not relevant for BSDtoBin. Thus, BSDtoBin may still be used for generating an adapted bitstream without requiring BFlavor to support BSDL-2.
3.3 Features of BFlavor
In Table 1, a summarizing table is presented that highlights the major differences and similarities between the normative tools of BSDL (i.e., the tools as available in BSDL-1 and BSDL-2), XFlavor, and BFlavor. The criteria were selected based on their relevance to this research and are organized in four main categories: language-specific, BSD-specific, and criteria specific to the BSD creation and customized bitstream generation.In short, a BFlavor-based adaptation chain has, among others, the benefit that it not only eliminates an inherent disadvantage of the BSDL tool chain (the inefficient performance behavior of the BintoBSD Parser), but it also cancels a major short-coming of the XFlavor tool chain (huge bitstream structure descriptions). This makes our harmonized approach an elegant alternative to a separate optimization of both technologies.
Criterion | BSDL | XFlavor | BFlavor |
C1. Language | |||
Developers | Philips Research | Columbia University | Ghent University |
Foundation | W3C XML Schema (restrictions, extensions) | C++/Java (restrictions, extensions) | XFlavor (restrictions, extensions) |
Community | metadata | developers | metadata and developers |
Format-agnostic | yes | yes | yes |
Purpose | to abstract multimedia content adaptation | to abstract bitstream parsing | to abstract multimedia content adaptation |
Schema-dependent | yes (e.g., binary encoding) | no (only validation) | yes (e.g., binary encoding) |
Flow control | BSDL-specific attributes (bs2:if, bs2:ifNext, bs2:nOccurs, bs2:ifUnion) and an XML Schema element (xsd:choice) | C++/Java-based flow control (if-else, (do-)while, for, switch-case) | C++/Java-based flow control (if-else, (do-)while, for, switch-case) |
Context access | XPath (BSDL-2 variables) | class parameters | class parameters and context classes |
Variable-length coding | weak | strong | weak |
Bitstream validation | possible (xsd:fixed) | possible (= operator) | possible (= operator) |
C2. BSD | |||
Structuring | XML | XML | XML |
Granularity | high-level | low-level | high-level |
C3. BSD generation | |||
Tool used | BintoBSD Parser v1.1.3 | parser generated by flavorc | parser generated by bflavorc |
Processing speed | slow | fast | fast |
Memory consumption | high | low | low |
C4. Customized bitstream generation | |||
Tool used | BSDtoBin Parser | Bitgen | BSDtoBin Parser |
Processing speed | fast | fast | fast |
Memory consumption | low | low | low |
BFlavor is able to describe the high-level structure of the following coding formats: MPEG-{1,2} Video/Systems, H.263+, MPEG-4 Visual, H.264/AVC, H.264/AVC Scalable Video Coding (SVC), Video Codec-1 (VC-1), JPEG2000, GIF87a, MPEG-1 Layer 3 (MP3), and MPEG-4 Advanced Audio Coding (AAC). These coding formats are tabulated in Table 2, where an overview is given of the granularity of the BFlavor codes together with their inherent applications regarding to XML-driven adaptation. Most of these applications are related to the exploitation of scalability. However, BSDs also enable other applications such as bitstream syntax validation, (de)multiplexing, automatic video summarization, metadata insertion, and scene selection.
Coding format | Granularity | Inherent applications |
MPEG-1/2 Video | picture header | frame dropping |
MPEG-1 Audio Layer 3 | frame header | cutting |
MPEG-1/2 Systems | PES packet | demultiplexing |
H.263+ | picture header | frame dropping, resolution scaling, quality scaling |
MPEG-4 Visual | video object plane | frame dropping |
H.264/MPEG-4 AVC | slice header | frame dropping, ROI scaling |
H.264/MPEG-4 AVC SVC | slice header | frame dropping, resolution scaling, quality scaling, ROI scaling |
MPEG-4 AAC | frame header | cutting |
VC-1 | slice | frame dropping |
JPEG2000 | packet | resolution scaling, quality scaling, colour scaling, ROI scaling |
MP4 file format | box | demultiplexing |
3.4 Performance results
3.4.1 MPEG-1 Video and H.264/AVC
Table 3 shows performance measurements for the MPEG-1 Video and H.264/AVC coding formats. BSDL, XFlavor, and BFlavor are compared in this table. Note that v1.1.3 of the BintoBSD parser was used in these experiments. For MPEG-1 Video, a parse unit corresponds to a Picture, while a Network Abstraction Layer Unit (NALU) is the parse unit in H.264/AVC. Figure 7 shows the compressed BSD sizes for the H.264/AVC coding format. From the experiments, it is clear that BFlavor is able to produce high-level BSDs at a constant execution speed while maintaining a low memory usage. As such, BFlavor combines the strengths of BSDL and XFlavor and eliminates their weaknesses.
Table 3: Performance measurements for BSDL, XFlavor, and BFlavor in terms of execution times and memory consumption needed for the generation of a BSD
Note that the performance results for MPEG-1 Video and H.264/AVC are described in more detail in the following publication:
De Neve, W.; Van Deursen, D.; De Schrijver, D.; De Wolf, K.; Lerouge, S.; Van de Walle R. ``BFlavor: a Harmonized Approach to Media Resource Adaptation, inspired by MPEG-21 BSDL and XFlavor. Signal Processing: Image Communication. Vol. 21, nr. 10. pp. 862-889.
3.4.2 H.264/AVC SVC
The second series of experiments compares BFlavor with the second version of BSDL (v1.3.1). As discussed earlier, this version of BSDL is characterized with a low and constant memory usage and a constant execution speed. Exploitation along the three scalability axes of H.264/AVC SVC was the target application of these experiments. Table 4 tabulates the execution times of the BFlavor- and BSDL-based adaptation chains. Note that we used two different forms of granularity to describe the H.264/AVC SVC bitstreams: up to and including the NALU header (limited) and up to and including the slice header. The latter case is the most challenging one because many context information is needed from the Sequence Parameter Sets (SPSs) and Picture Parameter Sets (PPSs) during the parsing process. The execution times of the adaptation chains (for the limited case) are compared with the length of the video sequence in Figure 8.It is clear from Figure 8 that the BFlavor-based adaptation chain is executed in real-time (in contrast to the MPEG-21 BSDL-based adaptation chain). Note that in this figure the resolution of the video sequence was rescaled to 640x256 pixels. Both technologies have a linear behavior in terms of execution time, a constant memory consumption (circa 3 MB for BSD generation), and a relatively compact BSD (26 MB uncompressed or 317 KB compressed with WinRAR 3.0’s default text compression algorithm when the size of the bitstream is 176 MB).
Although the optimized version of MPEG-21 BSDL’s BintoBSD parser shows the same characteristics in the performance measurements, there is a remarkable difference when we look at the BSD generation time (see Table 4). When parsing the H.264/AVC SVC bitstream up to and including the slice header, a lot of information has to be retrieved from the active SPS and PPS. This is not the case when parsing up to and including the NALU header. The BFlavor-based adaptation chain can adapt H.264/AVC SVC bitstreams in real-time, even when parsing up to and including the slice header. Looking at the MPEG-21 BSDL-based adaptation chain, we see a significant loss of performance when parsing up to and including the slice header. It is clear that the use of context classes by BFlavor performs much better than the XPath evaluation mechanism used in MPEG-21 BSDL for the retrieval of context information.
Note that the performance results for H.264/AVC SVC are described in more detail in the following publication:
Van Deursen, D.; De Schrijver, D.; De Neve, W.; Van de Walle, R. ``A Real-Time XML-Based Adaptation System for Scalable Video Formats.´´ Lecture Notes in Computer Science. Vol. 4261. 2006. pp. 339-348 [pdf].
In Figure 9, an overview is given of the performance results for each component in a BFlavor-based adaptation chain. These results were obtained by customizing an H.264/AVC SVC-compliant bitstream along the spatial scalability axis. It is clear that the bitstream is adapted in real time. Also, the produced BSDs are only a limited overhead in comparison to the size of the original bitstream.