📄 (2)mpeg-4 systems overview.htm
字号:
consuming the content. This results in a set of requirements for MPEG Systems on
streaming, synchronization and stream management, further described below.</P>
<UL>
<LI><B>Streaming</B>: The audio-visual information is to be delivered in a
streaming manner, suitable for live broadcast of such content. In other words,
the audio-visual data is to be transmitted piece by piece, in order to match
the delivery of the content to clients with limited network and terminal
capabilities. This is in stark contrast to some of the existing scenarios, the
World Wide Web, for example, wherein the audio-visual information is
completely downloaded on to the client terminal and then played back. It was
thought that such scenarios would necessitate too much storage on the client
terminals for applications envisaged by MPEG.
<LI><B>Synchronization</B>: Typically, the different components of an
audio-visual presentation are closely related in time. For most applications,
audio samples with associated video frames have to be presented together to
the user at precise instants in time. The MPEG representation needs to allow a
precise definition of the notion of time so that data received in a streaming
manner can be processed and presented at the right instants in time, and be
temporally synchronized with each other.
<LI><B>Stream Management</B>: Finally, the complete management of streams of
audio-visual information implies the need for certain mechanisms to allow an
application to consume the content. These include mechanisms for unambiguous
location of the content, identification of the content type, description of
the dependencies between content elements, access to the intellectual property
information associated to the data, etc. </LI></UL>
<P align=justify>In the previous MPEG-1 and MPEG-2 standards, these requirements
led to the definition of the following tools:</P>
<OL>
<LI><B>Systems Target Decoder (STD)</B>: The Systems Target Decoder is an
abstract model of an MPEG decoding terminal that describes the idealized
decoder architecture and defines the behavior of its architectural elements.
The STD provides for a precise definition of time and recovery of timing
information from information encoded within the streams themselves, as well as
mechanisms to synchronize streams with each other. It also allows for the
management of the decoder抯 buffers.
<LI><B>Packetization of Streams</B>: This set of tools defines the
organization of the various audio-visual data into streams. First, definition
of the structure of individual streams containing data of a single type is
provided (e.g., a video stream), followed by the multiplexing of different
individual streams for transport over a network or storage in disk files. At
each level, additional information is included to allow for the complete
management of the streams (synchronization, intellectual property rights,
etc.). </LI></OL>
<P align=justify>All these requirements are still relevant for MPEG-4. However,
the existing tools needed to be extended and adapted for the MPEG-4 context. In
some cases, these requirements led to the creation of new tools. More
specifically:</P>
<OL>
<LI><B>Systems Decoder Model (SDM)</B>: The nature of MPEG-4 streams can be
different from the ones dealt with in the traditional MPEG-1 and MPEG-2
decoder models. For example, MPEG-4 streams may have a bursty data delivery
schedule. They may be downloaded and cached before their actual presentation
to the user. Moreover, to implement the MPEG-4 principle of "create once,
access everywhere", the transport of content does not need to be (indeed,
should not be) integrated into the overall architecture. These new aspects,
therefore, led to a modification of the MPEG-1 and MPEG-2 models that resulted
in the <I>MPEG-4 Systems Decoder Model</I>.
<LI><B>Synchronization</B>: The MPEG-4 principle of "create once, access
everywhere" is easier to achieve when all of the content-related information
forms part of the encoded representation of the multimedia content. This
content-related information includes synchronization information also. The
observation that the range of bit rates addressed by MPEG-4 is broader than in
MPEG-1 and MPEG-2, and can be from a few kbit/s up to several Mbit/s led to
the definition of a flexible tool to encode the synchronization information,
the <I>Sync Layer (Synchronization Layer)</I>.
<LI><B>Packetization of Streams</B>: On the delivery side, most of the
existing networks provide ways for packetization and transport of streams.
Therefore, beyond defining the modes for the transport of MPEG-4 content on
the existing infrastructures, MPEG-4 Systems did not see a need to develop any
new tools for this purpose. However, due to the possibly unpredictable
temporal behavior of MPEG-4 data streams as well as the possibly large number
of such streams in MPEG-4 applications, MPEG-4 Systems developed an efficient
and simple multiplexing tool to enhance the transport of MPEG-4 data: the
<I>FlexMux</I> <I>(Flexible Multiplex)</I> tool. </LI></OL>
<OL>
<OL>
<LI><A name=_Ref441290795></A><B><A name=_Toc458054797>MPEG-4 Specific
Systems Requirements</A></B> </LI></OL></OL>
<P align=justify>The foundation of MPEG-4 is the coding of <B>audio-visual
objects</B>. As per MPEG-4 terminology, an audio-visual object is the
representation of a natural or synthetic object that has an audio and/or visual
manifestation. Examples of audio-visual objects include a video sequence
(perhaps with shape information), an audio track, an animated 3D face, speech
synthesized from text, or a background consisting of a still image.</P>
<P align=justify>The advantages of coding audio-visual objects can be summarized
as follows:</P>
<UL>
<LI>It allows interaction with the content. At the client side, users can be
given the possibility to access, manipulate, or activate specific parts of the
content.
<LI>It improves reusability and coding of the content. At the content creation
side, authors can easily organize and manipulate individual components and
reuse existing material. Moreover, each type of content can be coded using the
most effective algorithms. Artifacts due to joint coding of heterogeneous
objects (e.g., graphics overlaid on natural video) disappear.
<LI>It allows content-based scalability. At various stages in the
authoring/delivery/consumption process, content can be ignored or adapted to
match bandwidth, complexity, or price requirements. </LI></UL>
<P align=justify>In order to be able to use these audio-visual objects in a
presentation, additional information needs to be transmitted to the client
terminals. The individual audio-visual objects are only a part of the
presentation structure that an author wants delivered to the consumers. Indeed,
for the presentation at the client terminals, the coding of audio-visual objects
needs to be augmented by the following:</P>
<OL>
<LI>The coding of information that describes the spatio-temporal relationships
between the various audio-visual objects present in the presentation content.
In MPEG-4 terminology, this information is referred to as the <I><B>Scene
Description</B></I> information.
<LI>The coding of information that describes how time-dependent objects in the
scene description are linked to the streamed resources actually transporting
the time-dependent information. </LI></OL>
<P align=justify>These considerations imply additional requirements for the
overall architectural design, which are summarized below:</P>
<UL>
<LI><B>Object Description</B>: In addition to the identification of the
location of the streams, other information may need to be attached to streamed
resources. This may include the identification of streams in alternative
formats, a scalable stream hierarchy that may be attached to the object or
description of the coding format of the object.
<LI><B>Content Authoring</B>: The object description information is
conceptually different from the scene description information. It will
therefore have different life cycles. For example, at some instant in time,
object description information may change (like the intellectual property
rights of a stream or the availability of new streams) while the scene
description information remains the same. Similarly, the structure of the
scene may change (like changing the positions of the objects), while the
streaming resources remain the same.
<LI><B>Content Consumption</B>: The consumer of the content may wish to obtain
information <I>about </I>the content (e.g., the intellectual property attached
to it or the maximum bit rate needed to access it) before actually requesting
it. He then only needs to receive information about object description, not
about the scene description. </LI></UL>
<P align=justify>Besides the coding of audio-visual objects organized
spatio-temporally, according to a scene description, one of the key concepts of
MPEG-4 is the idea of interactivity, that is, that the content reacts upon the
action of a user. This general idea is expressed in three specific
requirements:</P>
<OL>
<LI><B>Client side interaction</B>: The user should be able to manipulate the
scene description as well as the properties of the audio-visual objects that
the author wants to expose to interaction.
<LI><B>Audio-visual objects behavior</B>: It should be possible to attach
behavior to audio-visual objects. User actions or other events, like time,
trigger these behaviors.
<LI><B>Client-Server interaction</B>: Finally, in case a return channel from
the client to the server is available, the user should be able to send back
information to the server that will act upon it and eventually send updates or
modification of the content. </LI></OL>
<OL>
<OL>
<LI><A name=_Ref443240463></A><A name=_Ref443240464></A><A
name=_Ref443240473></A><B><A name=_Toc458054798>What is MPEG-4
Systems?</A></B> </LI></OL></OL>
<P align=justify>The main concepts that were described in this section are
depicted in Figure 1. The mission, therefore, of the MPEG-4 Systems activity may
be summarized by the following sentence: <I>"Develop a coded, streamable
representation for audio-visual objects and their associated time-variant data
along with a description of how they are combined".</I></P><I>
<P align=center><IMG height=462
src="(2)MPEG-4 Systems Overview.files/Image9.gif" width=612></P>
<P align=center><A name=_Ref441399893>Figure 1</A>: MPEG-4 Systems
Principles</P>
<P align=justify> </P></I>
<P align=justify>More precisely, in this sentence:</P>
<UL>
<LI>"<I>Coded representation</I>" should be seen in contrast to "<I>textual
representation</I>". Indeed, all the information that MPEG-4 Systems contains
(scene description, object description, synchronization information) is binary
encoded for bandwidth efficiency.
<LI>"<I>Streamable</I>" should <I>not</I> be seen in contrast to
"<I>stored</I>", since storage and transport are dealt with in a similar and
consistent way in the MPEG-4 framework. It should rather be seen in contrast
to "<I>downloaded</I>". Indeed, MPEG-4 is built on the concept of streams that
have a temporal extension, and not on the concept of files of finite size.
<LI>"<I>Elementary audio-visual sources along with a description of how they
are combined</I>" should be seen in contrast to "<I>individual audio or visual
streams</I>". MPEG-4 Systems does not deal with the encoding of audio or
visual information but only with the information related to the combinations
of streams: combination of audio-visual objects to create an interactive
audio-visual scene, synchronization of streams, multiplexing of streams for
storage or transport. The term 慸escription
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -