📄 （2）mpeg-4 systems overview.htm

📁 关于MPRG4的一些基本的指南
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
consuming the content. This results in a set of requirements for MPEG Systems on 
streaming, synchronization and stream management, further described below.</P>
<UL>
  <LI><B>Streaming</B>: The audio-visual information is to be delivered in a 
  streaming manner, suitable for live broadcast of such content. In other words, 
  the audio-visual data is to be transmitted piece by piece, in order to match 
  the delivery of the content to clients with limited network and terminal 
  capabilities. This is in stark contrast to some of the existing scenarios, the 
  World Wide Web, for example, wherein the audio-visual information is 
  completely downloaded on to the client terminal and then played back. It was 
  thought that such scenarios would necessitate too much storage on the client 
  terminals for applications envisaged by MPEG. 
  <LI><B>Synchronization</B>: Typically, the different components of an 
  audio-visual presentation are closely related in time. For most applications, 
  audio samples with associated video frames have to be presented together to 
  the user at precise instants in time. The MPEG representation needs to allow a 
  precise definition of the notion of time so that data received in a streaming 
  manner can be processed and presented at the right instants in time, and be 
  temporally synchronized with each other. 
  <LI><B>Stream Management</B>: Finally, the complete management of streams of 
  audio-visual information implies the need for certain mechanisms to allow an 
  application to consume the content. These include mechanisms for unambiguous 
  location of the content, identification of the content type, description of 
  the dependencies between content elements, access to the intellectual property 
  information associated to the data, etc. </LI></UL>
<P align=justify>In the previous MPEG-1 and MPEG-2 standards, these requirements 
led to the definition of the following tools:</P>
<OL>
  <LI><B>Systems Target Decoder (STD)</B>: The Systems Target Decoder is an 
  abstract model of an MPEG decoding terminal that describes the idealized 
  decoder architecture and defines the behavior of its architectural elements. 
  The STD provides for a precise definition of time and recovery of timing 
  information from information encoded within the streams themselves, as well as 
  mechanisms to synchronize streams with each other. It also allows for the 
  management of the decoder抯 buffers. 
  <LI><B>Packetization of Streams</B>: This set of tools defines the 
  organization of the various audio-visual data into streams. First, definition 
  of the structure of individual streams containing data of a single type is 
  provided (e.g., a video stream), followed by the multiplexing of different 
  individual streams for transport over a network or storage in disk files. At 
  each level, additional information is included to allow for the complete 
  management of the streams (synchronization, intellectual property rights, 
  etc.). </LI></OL>
<P align=justify>All these requirements are still relevant for MPEG-4. However, 
the existing tools needed to be extended and adapted for the MPEG-4 context. In 
some cases, these requirements led to the creation of new tools. More 
specifically:</P>
<OL>
  <LI><B>Systems Decoder Model (SDM)</B>: The nature of MPEG-4 streams can be 
  different from the ones dealt with in the traditional MPEG-1 and MPEG-2 
  decoder models. For example, MPEG-4 streams may have a bursty data delivery 
  schedule. They may be downloaded and cached before their actual presentation 
  to the user. Moreover, to implement the MPEG-4 principle of "create once, 
  access everywhere", the transport of content does not need to be (indeed, 
  should not be) integrated into the overall architecture. These new aspects, 
  therefore, led to a modification of the MPEG-1 and MPEG-2 models that resulted 
  in the <I>MPEG-4 Systems Decoder Model</I>. 
  <LI><B>Synchronization</B>: The MPEG-4 principle of "create once, access 
  everywhere" is easier to achieve when all of the content-related information 
  forms part of the encoded representation of the multimedia content. This 
  content-related information includes synchronization information also. The 
  observation that the range of bit rates addressed by MPEG-4 is broader than in 
  MPEG-1 and MPEG-2, and can be from a few kbit/s up to several Mbit/s led to 
  the definition of a flexible tool to encode the synchronization information, 
  the <I>Sync Layer (Synchronization Layer)</I>. 
  <LI><B>Packetization of Streams</B>: On the delivery side, most of the 
  existing networks provide ways for packetization and transport of streams. 
  Therefore, beyond defining the modes for the transport of MPEG-4 content on 
  the existing infrastructures, MPEG-4 Systems did not see a need to develop any 
  new tools for this purpose. However, due to the possibly unpredictable 
  temporal behavior of MPEG-4 data streams as well as the possibly large number 
  of such streams in MPEG-4 applications, MPEG-4 Systems developed an efficient 
  and simple multiplexing tool to enhance the transport of MPEG-4 data: the 
  <I>FlexMux</I> <I>(Flexible Multiplex)</I> tool. </LI></OL>
<OL>
  <OL>
    <LI><A name=_Ref441290795></A><B><A name=_Toc458054797>MPEG-4 Specific 
    Systems Requirements</A></B> </LI></OL></OL>
<P align=justify>The foundation of MPEG-4 is the coding of <B>audio-visual 
objects</B>. As per MPEG-4 terminology, an audio-visual object is the 
representation of a natural or synthetic object that has an audio and/or visual 
manifestation. Examples of audio-visual objects include a video sequence 
(perhaps with shape information), an audio track, an animated 3D face, speech 
synthesized from text, or a background consisting of a still image.</P>
<P align=justify>The advantages of coding audio-visual objects can be summarized 
as follows:</P>
<UL>
  <LI>It allows interaction with the content. At the client side, users can be 
  given the possibility to access, manipulate, or activate specific parts of the 
  content. 
  <LI>It improves reusability and coding of the content. At the content creation 
  side, authors can easily organize and manipulate individual components and 
  reuse existing material. Moreover, each type of content can be coded using the 
  most effective algorithms. Artifacts due to joint coding of heterogeneous 
  objects (e.g., graphics overlaid on natural video) disappear. 
  <LI>It allows content-based scalability. At various stages in the 
  authoring/delivery/consumption process, content can be ignored or adapted to 
  match bandwidth, complexity, or price requirements. </LI></UL>
<P align=justify>In order to be able to use these audio-visual objects in a 
presentation, additional information needs to be transmitted to the client 
terminals. The individual audio-visual objects are only a part of the 
presentation structure that an author wants delivered to the consumers. Indeed, 
for the presentation at the client terminals, the coding of audio-visual objects 
needs to be augmented by the following:</P>
<OL>
  <LI>The coding of information that describes the spatio-temporal relationships 
  between the various audio-visual objects present in the presentation content. 
  In MPEG-4 terminology, this information is referred to as the <I><B>Scene 
  Description</B></I> information. 
  <LI>The coding of information that describes how time-dependent objects in the 
  scene description are linked to the streamed resources actually transporting 
  the time-dependent information. </LI></OL>
<P align=justify>These considerations imply additional requirements for the 
overall architectural design, which are summarized below:</P>
<UL>
  <LI><B>Object Description</B>: In addition to the identification of the 
  location of the streams, other information may need to be attached to streamed 
  resources. This may include the identification of streams in alternative 
  formats, a scalable stream hierarchy that may be attached to the object or 
  description of the coding format of the object. 
  <LI><B>Content Authoring</B>: The object description information is 
  conceptually different from the scene description information. It will 
  therefore have different life cycles. For example, at some instant in time, 
  object description information may change (like the intellectual property 
  rights of a stream or the availability of new streams) while the scene 
  description information remains the same. Similarly, the structure of the 
  scene may change (like changing the positions of the objects), while the 
  streaming resources remain the same. 
  <LI><B>Content Consumption</B>: The consumer of the content may wish to obtain 
  information <I>about </I>the content (e.g., the intellectual property attached 
  to it or the maximum bit rate needed to access it) before actually requesting 
  it. He then only needs to receive information about object description, not 
  about the scene description. </LI></UL>
<P align=justify>Besides the coding of audio-visual objects organized 
spatio-temporally, according to a scene description, one of the key concepts of 
MPEG-4 is the idea of interactivity, that is, that the content reacts upon the 
action of a user. This general idea is expressed in three specific 
requirements:</P>
<OL>
  <LI><B>Client side interaction</B>: The user should be able to manipulate the 
  scene description as well as the properties of the audio-visual objects that 
  the author wants to expose to interaction. 
  <LI><B>Audio-visual objects behavior</B>: It should be possible to attach 
  behavior to audio-visual objects. User actions or other events, like time, 
  trigger these behaviors. 
  <LI><B>Client-Server interaction</B>: Finally, in case a return channel from 
  the client to the server is available, the user should be able to send back 
  information to the server that will act upon it and eventually send updates or 
  modification of the content. </LI></OL>
<OL>
  <OL>
    <LI><A name=_Ref443240463></A><A name=_Ref443240464></A><A 
    name=_Ref443240473></A><B><A name=_Toc458054798>What is MPEG-4 
    Systems?</A></B> </LI></OL></OL>
<P align=justify>The main concepts that were described in this section are 
depicted in Figure 1. The mission, therefore, of the MPEG-4 Systems activity may 
be summarized by the following sentence: <I>"Develop a coded, streamable 
representation for audio-visual objects and their associated time-variant data 
along with a description of how they are combined".</I></P><I>
<P align=center><IMG height=462 
src="&#65288;2&#65289;MPEG-4 Systems Overview.files/Image9.gif" width=612></P>
<P align=center><A name=_Ref441399893>Figure 1</A>: MPEG-4 Systems 
Principles</P>
<P align=justify>&nbsp;</P></I>
<P align=justify>More precisely, in this sentence:</P>
<UL>
  <LI>"<I>Coded representation</I>" should be seen in contrast to "<I>textual 
  representation</I>". Indeed, all the information that MPEG-4 Systems contains 
  (scene description, object description, synchronization information) is binary 
  encoded for bandwidth efficiency. 
  <LI>"<I>Streamable</I>" should <I>not</I> be seen in contrast to 
  "<I>stored</I>", since storage and transport are dealt with in a similar and 
  consistent way in the MPEG-4 framework. It should rather be seen in contrast 
  to "<I>downloaded</I>". Indeed, MPEG-4 is built on the concept of streams that 
  have a temporal extension, and not on the concept of files of finite size. 
  <LI>"<I>Elementary audio-visual sources along with a description of how they 
  are combined</I>" should be seen in contrast to "<I>individual audio or visual 
  streams</I>". MPEG-4 Systems does not deal with the encoding of audio or 
  visual information but only with the information related to the combinations 
  of streams: combination of audio-visual objects to create an interactive 
  audio-visual scene, synchronization of streams, multiplexing of streams for 
  storage or transport. The term 慸escription
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -