📄 wave.htm

📁 wave文件结构说明
💻 HTM
📖 第 1 页 / 共 5 页
字号:

typedef struct {
  id     chunkid;
  long   chunksize;

  unsigned char unshiftednote;
  char          finetune;
  char          gain;
  unsigned char lownote;
  unsigned char highnote;
  unsigned char lowvelocity;
  unsigned char highvelocity;
} instrumentchunk;
</pre>

<p>the id is always <b>inst</b>. chunksize should always be 7 since there are no fields of
variable length. </p>

<p>the unshiftednote field is the same as the sampler chunk's dwmidiunitynote field.</p>

<p>the finetune field determines how much the instrument should alter the pitch of the
sound when it is played back. units are in cents (1/100 of a semitone) and range from -50
to +50. negative numbers mean that the pitch of the sound should be lowered, while
positive numbers mean that it should be raised. while not the same measurement is used,
this field serves the same purpose as the sampler chunk's dwfraction field.</p>

<p>the gain field is the amount by which to change the gain of the sound when it is
played. units are decibels. for example, 0db means no change, 6db means double the value
of each sample point (ie, every additional 6db doubles the gain), while -6db means halve
the value of each sample point.</p>

<p>the lownote and highnote fields specify the suggested midi note range on a keyboard for
playback of the waveform data. the waveform data should be played if the instrument is
requested to play a note between the low and high note numbers, inclusive. the
unshiftednote does not have to be within this range.</p>

<p>the lowvelocity and highvelocity fields specify the suggested range of midi velocities
for playback of the waveform data. the waveform data should be played if the note-on
velocity is between low and high velocity, inclusive. the range is 1 (lowest velocity)
through 127 (highest velocity), inclusive.</p>

<p>the instrument chunk is optional. no more than 1 instrument chunk can appear in one
wave.</p>

<h1>audio interchange file format (aiff)</h1>

<p>audio interchange file format (or aiff) is a file format for storing digital audio
(waveform) data. it supports a variety of bit resolutions, sample rates, and channels of
audio. this format is very popular upon apple platforms, and is widely used in
professional programs that process digital audio waveforms.</p>

<p>this format uses the electronic arts interchange file format method for storing data in
&quot;chunks&quot;. you should read the article <em>about interchange file format</em>
before proceeding.</p>

<h3>data types</h3>

<p>a c-like language will be used to describe the data structures in the file. a few extra
data types that are not part of standard c, but which will be used in this document, are:</p>

<table border="0">
  <tr>
    <td><b>extended</b></td>
    <td>80 bit ieee standard 754 floating point number (standard apple numeric environment
    [sane] data type extended). this would be a 10 byte field.</td>
  </tr>
  <tr>
    <td><b>pstring</b></td>
    <td>pascal-style string, a one-byte count followed by that many text bytes. the total
    number of bytes in this data type should be even. a pad byte can be added to the end of
    the text to accomplish this. this pad byte is not reflected in the count.</td>
  </tr>
  <tr>
    <td><b>id</b></td>
    <td>a chunk id (ie, 4 ascii bytes) as described in <em>about interchange file format</em>.</td>
  </tr>
</table>

<p>also note that when you see an array with no size specification (e.g., char ckdata[];),
this indicates a variable-sized array in our c-like language. this differs from standard c
arrays.</p>

<h3>constants</h3>

<p>decimal values are referred to as a string of digits, for example 123, 0, 100 are all
decimal numbers. hexadecimal values are preceded by a 0x - e.g., 0x0a, 0x1, 0x64. </p>

<h3>data organization</h3>

<p>all data is stored in motorola 68000 (ie, big endian) format. the bytes of
multiple-byte values are stored with the high-order (ie, most significant) bytes first.
data bits are as follows (ie, shown with bit numbers on top): </p>

<pre>
         7  6  5  4  3  2  1  0
       +-----------------------+
 char: | msb               lsb |
       +-----------------------+

        15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
       +-----------------------+-----------------------+
short: | msb     byte 0        |       byte 1      lsb |
       +-----------------------+-----------------------+

        31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
       +-----------------------+-----------------------+-----------------------+-----------------------+
 long: | msb     byte 0        |       byte 1          |         byte 2	       |       byte 3      lsb |
       +-----------------------+-----------------------+-----------------------+-----------------------+
</pre>

<h3>file structure</h3>

<p>an audio iff file is a collection of a number of different types of chunks. there is a
required common chunk which contains important parameters describing the waveform, such as
its length and sample rate. the sound data chunk, which contains the actual waveform data,
is also required if the waveform data has a length greater than 0 (ie, there actually is
waveform data in the form). all other chunks are optional. among the other optional chunks
are ones which define markers, list instrument parameters, store application-specific
information, etc. all of these chunks are described in detail in the following sections of
this document.</p>

<p>all applications that use form aiff must be able to read the 2 required chunks and can
choose to selectively ignore the optional chunks. a program that copies a form aiff should
copy all of the chunks in the form aiff, even those it chooses not to interpret.</p>

<p>there are no restrictions upon the order of the chunks within a form aiff.</p>

<p>here is a graphical overview of an example, minimal aiff file. it consists of a single
form aiff containing the 2 required chunks, a common chunk and a sound data chunk.</p>

<pre> __________________________
| form aiff chunk	   |
|   ckid  = 'form'         |
|   formtype = 'aiff'      |
|    __________________    |
|   | common chunk     |   |
|   |	ckid = 'comm'  |   |
|   |__________________|   |
|    __________________    |
|   | sound data chunk |   |
|   |	ckid = 'ssnd'  |   |
|   |__________________|   |
|__________________________|
</pre>

<h3>sample points and sample frames</h3>

<p>a large part of interpreting audio iff files revolves around the two concepts of sample
points and sample frames. </p>

<p>a sample point is a value representing a sample of a sound at a given moment in time.
each sample point is stored as a linear, 2's-complement value which may be from 1 to 32
bits wide (as determined by the samplesize field in the common chunk). for example, each
sample point of an 8-bit waveform would be an 8-bit byte (ie, a signed char).</p>

<p>because most cpu's read and write operations deal with 8-bit bytes, it was decided that
a sample point should be rounded up to a size which is a multiple of 8 when stored in an
aiff. this makes the aiff easier to read into memory. if your adc produces a sample point
from 1 to 8 bits wide, a sample point should be stored in an aiff as an 8-bit byte (ie,
signed char). if your adc produces a sample point from 9 to 16 bits wide, a sample point
should be stored in an aiff as a 16-bit word (ie, signed short). if your adc produces a
sample point from 17 to 24 bits wide, a sample point should be stored in an aiff as three
bytes. if your adc produces a sample point from 25 to 32 bits wide, a sample point should
be stored in an aiff as a 32-bit doubleword (ie, signed long). etc.</p>

<p>furthermore, the data bits should be left-justified, with any remaining (ie, pad) bits
zeroed. for example, consider the case of a 12-bit sample point. it has 12 bits, so the
sample point must be saved as a 16-bit word. those 12 bits should be left-justified so
that they become bits 4 to 15 inclusive, and bits 0 to 3 should be set to zero. shown
below is how a 12-bit sample point with a value of binary 101000010111 is stored
left-justified as two bytes (ie, a 16-bit word).</p>

<pre>
 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |   |   |   |   |   |   |   |   |
| 1   0   1   0   0   0   0   1   0   1   1   1   0   0   0   0 |
|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|
 <---------------------------------------------> <------------->
    12 bit sample point is left justified          rightmost
                                                  4 bits are
                                                  zero padded
</pre>

<p>for multichannel sounds (for example, a stereo waveform), single sample points from
each channel are interleaved. for example, assume a stereo (ie, 2 channel) waveform.
instead of storing all of the sample points for the left channel first, and then storing
all of the sample points for the right channel next, you &quot;mix&quot; the two channels'
sample points together. you would store the first sample point of the left channel. next,
you would store the first sample point of the right channel. next, you would store the
second sample point of the left channel. next, you would store the second sample point of
the right channel, and so on, alternating between storing the next sample point of each
channel. this is what is meant by interleaved data; you store the next sample point of
each of the channels in turn, so that the sample points that are meant to be
&quot;played&quot; (ie, sent to a dac) simultaneously are stored contiguously. </p>

<p>the sample points that are meant to be &quot;played&quot; (ie, sent to a dac)
simultaneously are collectively called a <b>sample frame</b>. in the example of our stereo
waveform, every two sample points makes up another sample frame. this is illustrated below
for that stereo example. </p>

<pre>
  sample       sample              sample
  frame 0      frame 1             frame n
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|
</pre>

<p>for a monophonic waveform, a sample frame is merely a single sample point (ie, there's
nothing to interleave). for multichannel waveforms, you should follow the conventions
shown below for which order to store channels within the sample frame. (ie, below, a
single sample frame is displayed for each example of a multichannel waveform). </p>

<pre>
  channels       1         2
             _________ _________ 
            | left    | right   |
  stereo    |         |         |
            |_________|_________|


                 1         2         3
             _________ _________ _________ 
            | left    | right   | center  |
  3 channel |         |         |         |
            |_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | front   | front   | rear    | rear    |
  quad      | left    | right   | left    | right   |
            |_________|_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | left    | center  | right   | surround|
  4 channel |         |         |         |         |
            |_________|_________|_________|_________|

                 1         2         3         4         5         6
             _________ _________ _________ _________ _________ _________
            | left    | left    | center  | right   | right   |surround |
  6 channel | center  |         |         | center  |         |         |
            |_________|_________|_________|_________|_________|_________|
</pre>

<p>the sample points within a sample frame are packed together; there are no unused bytes
between them. likewise, the sample frames are packed together with no pad bytes. </p>

<hr>

<h3>the common chunk</h3>

<p>the common chunk describes fundamental parameters of the waveform data such as sample
rate, bit resolution, and how many channels of digital audio are stored in the form aiff. </p>

<pre>
#define commonid 'comm'   /* chunkid for common chunk */

typedef struct {
  id             chunkid;
  long           chunksize;

  short          numchannels;
  unsigned long  numsampleframes;
  short          samplesize;
  extended       samplerate;
} commonchunk;
</pre>

<p>the id is always <b>comm</b>. the chunksize field is the number of bytes in the chunk.
this does not include the 8 bytes used by id and size fields. for the common chunk,
chunksize should always 18 since there are no fields of variable length (but to maintain
compatibility with possible future extensions, if the chunksize is &gt; 18, you should
always treat those extra bytes as pad bytes). </p>

<p>the numchannels field contains the number of audio channels for the sound. a value of 1
means monophonic sound, 2 means stereo, 4 means four channel sound, etc. any number of
💿 文件大小 23 K
👤 上传用户 mmmmmmmmmxxx
📂 所属分类文章/文档
🏷️ 相关标签

#wave #文件结构
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -