BFlavor

Detailed Overview


1 Description-driven adaptation of (scalable) media resources

One way to realize a format-agnostic adaptation engine is to rely on automatically generated textual descriptions. These descriptions, further referred to as Bitstream Syntax Descriptions (BSDs), contain information about the high-level structure of a scalable bitstream.

1.1 Overall approach

A description-driven adaptation process typically consists of three main steps, which are illustrated in Figure 1:
  • BSD generation: given a scalable bitstream, a BSD is generated containing information about the high-level structure of the bitstream (structural metadata). In particular, a BSD describes how the bitstream is organized in layers or packets of data. Note that a BSD is not meant to replace the original binary data; it rather acts as an additional layer, similar to metadata. Also, a BSD is typically expressed by making use of the eXtensible Markup Language (XML).
  • BSD transformation: the actual adaptation process takes place during the BSD transformation. The BSD is transformed (e.g., by dropping layers or packets) according to the constraints of a given usage environment (e.g., available bandwidth). Two important possibilities exist for transforming XML documents. A first option is to use a procedural programming language in order to write a program consisting of an XML parser (e.g., based on Document Object Model (DOM) or Simple API for XML (SAX)) and additional transformation logic. A second option is to use a format-agnostic transformation engine that is able to interpret different transformation stylesheets. These stylesheets use a standardized (XML-based) language to describe the transformation logic. Examples of such standardized XML-based languages are eXtensible Stylesheet Language Transformations (XSLT) and Streaming Transformations for XML (STX).
  • Bitstream generation: the last step is the generation of the adapted bitstream. The transformed BSD is used to steer the bitstream generation process. The adapted bitstream is then suited for playback in a given usage environment.

1.2 Benefits

Using BSDs for adapting scalable bitstreams enables the use of format-agnostic software within the adaptation framework. Indeed, the software operates on the BSD level, and is not aware of the underlying coding format. Thanks to the use of XML-based BSDs, many already existing tools for manipulating XML documents can be used, like editors and transformation engines. XML-based BSDs also allow the integration with other metadata standards, such as the MPEG-7 specification. Hence, description-driven media resource adaptation can be used to realize so-called 'intelligent' adaptations (e.g., automatic video summarization and scene selection).

2 Related work

In recent years, a number of bitstream syntax description languages have been defined with a view to XML-driven manipulation of digital media resources, such as the Formal Language for Audio-Visual Object Representation, extended with XML features (XFlavor), the MPEG Video Markup Language (MPML), the Bitstream Syntax Description Language (BSDL), and MPEG-21 generic Bitstream Syntax Schema (MPEG-21 gBS Schema). MPML is an XML-based language that is specifically designed for describing the syntax of bitstreams compliant with MPEG-4 Visual. This language will not be discussed in further detail due to its very specific nature.

2.1 XFlavor

Flavor provides a formal way to specify how data are laid out in a bitstream. It was initially designed as a declarative language with a C++-like syntax to describe the bitstream syntax on a bit-per-bit basis. Its aim is to simplify the development of software that processes audiovisual bitstreams by automatically generating the required C++ or Java code to parse the data, allowing a developer to concentrate on the processing part of the software. Unlike the BSD approach, a Flavor-based parser does not generate a persistent description of the parsed data, but only an in-memory representation in the form of a collection of C++ or Java class objects. As a result, XFlavor was developed to provide tools to generate an XML description of the bitstream syntax and for regenerating an (adapted) bitstream. Figure 2 summarizes the overall method for XML-driven media content manipulation by relying on XFlavor. Explanatory notes for Figure 2 are given below:
  1. the syntax of a particular media format is described using Flavor;
  2. the description in Flavor is translated by the Flavorc engine into an XML Schema (for validation purposes) and a set of Java or C++ source classes;
  3. the source classes, together with a separate main() method, are compiled to a media format-specific parser;
  4. BSD creation by the format-specific parser, taking as input a particular bitstream;
  5. a BSD is transformed to meet the constraints of a certain usage environment;
  6. Bitgen creates an adapted bitstream, only guided by a transformed BSD.

2.2 BSDL

BSDL provides means for translating the syntax of a particular media resource into an XML description. The language is built on top of W3C XML Schema and falls under part 5 of the MPEG-B standard. The primary motivation behind the development of BSDL is to assist in the adaptation of scalable bitstreams such that the resulting bitstreams meet the constraints of a certain usage environment. The generic character of BSDL, and hence its merit, lies in the media format-independent nature of the different software modules that are responsible for the creation of the BSDs and for the generation of the adapted bitstreams. The BSD generator and bitstream generator are named BintoBSD Parser and BSDtoBin Parser, respectively. Figure 3 summarizes the overall method for adapting a scalable bitstream using BSDL, illustrating the removal of particular bidirectionally coded pictures to create a tailored bitstream suited for a constrained usage environment. Explanatory notes for this figure are provided below:
  1. an MPEG-21 bitstream syntax schema (BS Schema) contains a description of the high-level syntax of a particular media format;
  2. a BSD is created by a format-independent BintoBSD Parser, taking as input a particular bitstream and a corresponding BS Schema;
  3. a BSD is transformed to meet the constraints of a usage environment;
  4. a format-independent BSDtoBin Parser creates an adapted bitstream, using the transformed BSD and the BS Schema, optionally taking the original bitstream as an additional input (denoted by the dashed arrow).

Note that with XFlavor, the complete bitstream data are actually embedded in the BSD, resulting in potentially verbose descriptions, while BSDL uses a specific datatype to point to a data range in the original bitstream when it is too verbose to be included in the BSD. This is why, unlike XFlavor, BSDL is a description language, rather than a representation language, and can describe the bitstream at a high syntactical level instead of at a low-level, bit-per-bit basis.

2.3 MPEG-21 gBS Schema

MPEG-21 gBS Schema is tool of Digital Item Adaptation (DIA), which is part 7 of the MPEG-21 Multimedia Framework. The MPEG-21 Multimedia Framework aims at realizing the 'big picture' in the multimedia production, delivery, and consumption chain. In contrast to BSDL and XFlavor, gBS Schema enables the creation of format-independent descriptions, which are called generic Bitstream Syntax Descriptions (gBSDs). The functioning of a gBS Schema-based adaptation framework is illustrated in Figure 4. The first step is the generation of a gBSD. This process is not described in the DIA specification, which implies that a gBSD may be generated in any proprietary way. The gBSD is subsequently transformed by using common XML transformation technologies such as XSLT or STX. After the transformation of the gBSD, an adapted bitstream is obtained using the gBSDtoBin parser. This parser takes as input the transformed gBSD and relies on the gBS Schema to steer the generation of the adapted bitstream. The gBS Schema is expressed by making use of W3C?s XML Schema Language and has the following properties:
  • it acts as an XML Schema for the gBSDs;
  • it is independent of the underlying coding format, which implies that gBSDs are formatagnostic;
  • it enables the semantically meaningful marking of syntactical elements;
  • it provides support for hierarchical adaptations by describing hierarchies of syntactical units;
  • it contains an enhanced addressing scheme to support efficient bitstream access.
Since all the necessary information to generate the adapted bitstream is included in the gBSD and the semantics of the gBS Schema elements, the gBSDtoBin process does not have to be aware of the underlying coding format.

The automatic generation of gBSDs is a well-known problem. Although gBS schema is generic (format-independent), the generation process is not since this requires specific software which is able to parse the format in question. Both this application-dependent and format-dependent aspect of the generation process for gBS Schema makes it difficult to create a format-agnostic parser able to produce application-specific gBSDs.

3 BFlavor

Both BSDL and XFlavor can be applied independently. However, both solutions are characterized by several complementary properties. In particular, the processing efficiency and flow-control flexibility provided by XFlavor, and the ability to create high-level BSDs using BSDL, were our key motives for the development of a harmonized BSD-based content adaptation framework.

3.1 History

The initial version of the MPEG-21 BSDL specification is characterized by a fundamental performance problem pertaining to the automatic generation of BSDs for elementary video bitstreams. Indeed, real-life implementations of the format-agnostic BintoBSD process are required to keep the entire BSD in the system memory. This allows the evaluation of a set of arbitrary XPath 1.0 expressions in an at run time fashion, i.e. while parsing the bitstream and progressively generating its BSD. Consequently, this requirement results in an increasing memory usage and a decreasing processing speed for a BintoBSD Parser during the generation of an XML description for the high-level structure of a binary media resource.

Several solutions can be proposed to improve the performance behavior of BSDL's BintoBSD process. We proposed BFlavor, an alternative to the use of BSDL's format-agnostic BintoBSD Parser (see Figure 5). Note that the developers of BSDL also worked on an improved version of BSDL. This resulted in an optimized BintoBSD parser (v1.3.1) which is characterized by a constant execution time and a constant memory usage. However, performance results will state that BFlavor still outperforms this optimized version of BSDL in terms of execution times.


3.2 Functioning of a BFlavor-enabled media resource adaptation framework

BFlavor allows the automatic generation of a set of source classes for a media format-specific parser, as well as the automatic generation of a BS Schema. The intended content adaptation architecture, bridging the gap between XFlavor and BSDL, is depicted in Figure 6. Explanatory notes are provided below:
  1. the syntax of a particular media format is described using BFlavor;
  2. generation of a BS Schema and a set of Java source classes by the bflavorc translator (the modified flavorc translator), constituting the core of a parser able to produce XML output that is in line with the generated BS Schema;
  3. the generated parser (obtained in step 2) is used for the generation of a BSD, given a (scalable) media resource;
  4. further processing of the BSD by a BSD transformation engine;
  5. a standard BSDtoBin Parser uses the automatically created BS Schema and the transformed BSD for the generation of a tailored media resource.
The XML output of the BFlavor-based parser is equivalent to that produced by BintoBSD, and hence processable by BSDtoBin. As such, the use of BFlavor-driven parsers, which are format-specific but generated automatically by a format-independent process, is an efficient alternative to the use of a format-neutral BintoBSD Parser.

Note that a BS Schema is not needed by a BFlavor-based parser to generate a BSD. This is due to the format-specific nature of the parser. A BS Schema is only required by BSDtoBin to create an adapted bitstream. This allows the automatic creation of BS Schemata that contain a minimal amount of information such that BSDtoBin can convert each element value in a BSD to a bit-level representation. Such functionality can already be provided by an XML Schema using BSDL-1 datatypes, as BSDL-2 is specific for BintoBSD and not relevant for BSDtoBin. Thus, BSDtoBin may still be used for generating an adapted bitstream without requiring BFlavor to support BSDL-2.

3.3 Features of BFlavor

In Table 1, a summarizing table is presented that highlights the major differences and similarities between the normative tools of BSDL (i.e., the tools as available in BSDL-1 and BSDL-2), XFlavor, and BFlavor. The criteria were selected based on their relevance to this research and are organized in four main categories: language-specific, BSD-specific, and criteria specific to the BSD creation and customized bitstream generation.
In short, a BFlavor-based adaptation chain has, among others, the benefit that it not only eliminates an inherent disadvantage of the BSDL tool chain (the inefficient performance behavior of the BintoBSD Parser), but it also cancels a major short-coming of the XFlavor tool chain (huge bitstream structure descriptions). This makes our harmonized approach an elegant alternative to a separate optimization of both technologies.

Criterion BSDL XFlavor BFlavor
C1. Language      
Developers Philips Research Columbia University Ghent University
Foundation W3C XML Schema (restrictions, extensions) C++/Java (restrictions, extensions) XFlavor (restrictions, extensions)
Community metadata developers metadata and developers
Format-agnostic yes yes yes
Purpose to abstract multimedia content adaptation to abstract bitstream parsing to abstract multimedia content adaptation
Schema-dependent yes (e.g., binary encoding) no (only validation) yes (e.g., binary encoding)
Flow control BSDL-specific attributes (bs2:if, bs2:ifNext, bs2:nOccurs, bs2:ifUnion) and an XML Schema element (xsd:choice) C++/Java-based flow control (if-else, (do-)while, for, switch-case) C++/Java-based flow control (if-else, (do-)while, for, switch-case)
Context access XPath (BSDL-2 variables) class parameters class parameters and context classes
Variable-length coding weak strong weak
Bitstream validation possible (xsd:fixed) possible (= operator) possible (= operator)
C2. BSD      
Structuring XML XML XML
Granularity high-level low-level high-level
C3. BSD generation      
Tool used BintoBSD Parser v1.1.3 parser generated by flavorc parser generated by bflavorc
Processing speed slow fast fast
Memory consumption high low low
C4. Customized bitstream generation      
Tool used BSDtoBin Parser Bitgen BSDtoBin Parser
Processing speed fast fast fast
Memory consumption low low low

BFlavor is able to describe the high-level structure of the following coding formats: MPEG-{1,2} Video/Systems, H.263+, MPEG-4 Visual, H.264/AVC, H.264/AVC Scalable Video Coding (SVC), Video Codec-1 (VC-1), JPEG2000, GIF87a, MPEG-1 Layer 3 (MP3), and MPEG-4 Advanced Audio Coding (AAC). These coding formats are tabulated in Table 2, where an overview is given of the granularity of the BFlavor codes together with their inherent applications regarding to XML-driven adaptation. Most of these applications are related to the exploitation of scalability. However, BSDs also enable other applications such as bitstream syntax validation, (de)multiplexing, automatic video summarization, metadata insertion, and scene selection.

Coding format Granularity Inherent applications
MPEG-1/2 Video picture header frame dropping
MPEG-1 Audio Layer 3 frame header cutting
MPEG-1/2 Systems PES packet demultiplexing
H.263+ picture header frame dropping, resolution scaling, quality scaling
MPEG-4 Visual video object plane frame dropping
H.264/MPEG-4 AVC slice header frame dropping, ROI scaling
H.264/MPEG-4 AVC SVC slice header frame dropping, resolution scaling, quality scaling, ROI scaling
MPEG-4 AAC frame header cutting
VC-1 slice frame dropping
JPEG2000 packet resolution scaling, quality scaling, colour scaling, ROI scaling
MP4 file format box demultiplexing

3.4 Performance results

3.4.1 MPEG-1 Video and H.264/AVC

Table 3 shows performance measurements for the MPEG-1 Video and H.264/AVC coding formats. BSDL, XFlavor, and BFlavor are compared in this table. Note that v1.1.3 of the BintoBSD parser was used in these experiments. For MPEG-1 Video, a parse unit corresponds to a Picture, while a Network Abstraction Layer Unit (NALU) is the parse unit in H.264/AVC. Figure 7 shows the compressed BSD sizes for the H.264/AVC coding format. From the experiments, it is clear that BFlavor is able to produce high-level BSDs at a constant execution speed while maintaining a low memory usage. As such, BFlavor combines the strengths of BSDL and XFlavor and eliminates their weaknesses.

Table 3: Performance measurements for BSDL, XFlavor, and BFlavor in terms of execution times and memory consumption needed for the generation of a BSD
Performance measurements for BSDL, XFlavor, and BFlavor in terms of execution times and memory consumption needed for the generation of a BSD



Note that the performance results for MPEG-1 Video and H.264/AVC are described in more detail in the following publication:
De Neve, W.; Van Deursen, D.; De Schrijver, D.; De Wolf, K.; Lerouge, S.; Van de Walle R. ``BFlavor: a Harmonized Approach to Media Resource Adaptation, inspired by MPEG-21 BSDL and XFlavor. Signal Processing: Image Communication. Vol. 21, nr. 10. pp. 862-889.

3.4.2 H.264/AVC SVC

The second series of experiments compares BFlavor with the second version of BSDL (v1.3.1). As discussed earlier, this version of BSDL is characterized with a low and constant memory usage and a constant execution speed. Exploitation along the three scalability axes of H.264/AVC SVC was the target application of these experiments. Table 4 tabulates the execution times of the BFlavor- and BSDL-based adaptation chains. Note that we used two different forms of granularity to describe the H.264/AVC SVC bitstreams: up to and including the NALU header (limited) and up to and including the slice header. The latter case is the most challenging one because many context information is needed from the Sequence Parameter Sets (SPSs) and Picture Parameter Sets (PPSs) during the parsing process. The execution times of the adaptation chains (for the limited case) are compared with the length of the video sequence in Figure 8.

It is clear from Figure 8 that the BFlavor-based adaptation chain is executed in real-time (in contrast to the MPEG-21 BSDL-based adaptation chain). Note that in this figure the resolution of the video sequence was rescaled to 640x256 pixels. Both technologies have a linear behavior in terms of execution time, a constant memory consumption (circa 3 MB for BSD generation), and a relatively compact BSD (26 MB uncompressed or 317 KB compressed with WinRAR 3.0’s default text compression algorithm when the size of the bitstream is 176 MB).

Although the optimized version of MPEG-21 BSDL’s BintoBSD parser shows the same characteristics in the performance measurements, there is a remarkable difference when we look at the BSD generation time (see Table 4). When parsing the H.264/AVC SVC bitstream up to and including the slice header, a lot of information has to be retrieved from the active SPS and PPS. This is not the case when parsing up to and including the NALU header. The BFlavor-based adaptation chain can adapt H.264/AVC SVC bitstreams in real-time, even when parsing up to and including the slice header. Looking at the MPEG-21 BSDL-based adaptation chain, we see a significant loss of performance when parsing up to and including the slice header. It is clear that the use of context classes by BFlavor performs much better than the XPath evaluation mechanism used in MPEG-21 BSDL for the retrieval of context information.



Note that the performance results for H.264/AVC SVC are described in more detail in the following publication:
Van Deursen, D.; De Schrijver, D.; De Neve, W.; Van de Walle, R. ``A Real-Time XML-Based Adaptation System for Scalable Video Formats.´´ Lecture Notes in Computer Science. Vol. 4261. 2006. pp. 339-348 [pdf].

In Figure 9, an overview is given of the performance results for each component in a BFlavor-based adaptation chain. These results were obtained by customizing an H.264/AVC SVC-compliant bitstream along the spatial scalability axis. It is clear that the bitstream is adapted in real time. Also, the produced BSDs are only a limited overhead in comparison to the size of the original bitstream.


4 gBFlavor

We are currently developing an extension to BFlavor, called gBFlavor. It enables the automatic generation of a format-specific parser that is able to produce generic Bitstream Syntax Descriptions (gBSDs). gBSDs are used in an MPEG-21 gBS Schema-enabled adaptation framework. More detailed information regarding the functioning of gBFlavor will be provided soon.