📄 wave.htm

📁 wave文件结构说明
💻 HTM
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
</x-html>
<x-html>
</x-html>
<html>

<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="premium" content="msdn">
<meta name="ms.locale" content="en-us">
<meta name="description" content>
<meta name="generator" content="microsoft frontpage 3.0">
<title>audio interchange file format</title>
</head>

<body bgcolor="#ffffff" link="#003399" vlink="#996699" background="../jpg/di1.JPG">

<h1>wave file format</h1>

<p>wave file format is a file format for storing digital audio (waveform) data. it
supports a variety of bit resolutions, sample rates, and channels of audio. this format is
very popular upon ibm pc (clone) platforms, and is widely used in professional programs
that process digital audio waveforms. it takes into account some pecularities of the intel
cpu such as little endian byte order.</p>

<p>this format uses microsoft's version of the electronic arts interchange file format
method for storing data in &quot;chunks&quot;. </p>

<h3>data types</h3>

<p>a c-like language will be used to describe the data structures in the file. a few extra
data types that are not part of standard c, but which will be used in this document, are:</p>

<table border="0">
  <tr>
    <td><b>pstring</b></td>
    <td>pascal-style string, a one-byte count followed by that many text bytes. the total
    number of bytes in this data type should be even. a pad byte can be added to the end of
    the text to accomplish this. this pad byte is not reflected in the count.</td>
  </tr>
  <tr>
    <td><b>id</b></td>
    <td>a chunk id (ie, 4 ascii bytes).</td>
  </tr>
</table>

<p>also note that when you see an array with no size specification (e.g., char ckdata[];),
this indicates a variable-sized array in our c-like language. this differs from standard c
arrays.</p>

<h3>constants</h3>

<p>decimal values are referred to as a string of digits, for example 123, 0, 100 are all
decimal numbers. hexadecimal values are preceded by a 0x - e.g., 0x0a, 0x1, 0x64. </p>

<h3>data organization</h3>

<p>all data is stored in 8-bit bytes, arranged in intel 80x86 (ie, little endian) format.
the bytes of multiple-byte values are stored with the low-order (ie, least significant)
bytes first. data bits are as follows (ie, shown with bit numbers on top): </p>

<pre>
         7  6  5  4  3  2  1  0
       +-----------------------+
 char: | lsb               msb |
       +-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8
       +-----------------------+-----------------------+
short: | lsb     byte 0        |       byte 1      msb |
       +-----------------------+-----------------------+

         7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
       +-----------------------+-----------------------+-----------------------+-----------------------+
 long: | lsb     byte 0        |       byte 1          |         byte 2	       |       byte 3      msb |
       +-----------------------+-----------------------+-----------------------+-----------------------+
</pre>

<h3>file structure</h3>

<p>a wave file is a collection of a number of different types of chunks. there is a
required format (&quot;fmt &quot;) chunk which contains important parameters describing
the waveform, such as its sample rate. the data chunk, which contains the actual waveform
data, is also required. all other chunks are optional. among the other optional chunks are
ones which define cue points, list instrument parameters, store application-specific
information, etc. all of these chunks are described in detail in the following sections of
this document.</p>

<p>all applications that use wave must be able to read the 2 required chunks and can
choose to selectively ignore the optional chunks. a program that copies a wave should copy
all of the chunks in the wave, even those it chooses not to interpret.</p>

<p>there are no restrictions upon the order of the chunks within a wave file, with the
exception that the format chunk must precede the data chunk. some inflexibly written
programs expect the format chunk as the first chunk (after the riff header) although they
shouldn't because the specification doesn't require this.</p>

<p>here is a graphical overview of an example, minimal wave file. it consists of a single
wave containing the 2 required chunks, a format and a data chunk.</p>

<pre> __________________________
| riff wave chunk	   |
|   groupid  = 'riff'      |
|   rifftype = 'wave'      |
|    __________________    |
|   | format chunk     |   |
|   |	ckid = 'fmt '  |   |
|   |__________________|   |
|    __________________    |
|   | sound data chunk |   |
|   |	ckid = 'data'  |   |
|   |__________________|   |
|__________________________|
</pre>

<h4>a bastardized standard</h4>

<p>the wave format is sort of a bastardized standard that was concocted by too many
&quot;cooks&quot; who didn't properly coordinate the addition of &quot;ingredients&quot;
to the &quot;soup&quot;. unlike with the aiff standard which was mostly designed by a
small, coordinated group, the wave format has had all manner of much-too-independent,
uncoordinated aberrations inflicted upon it. the net result is that there are far too many
chunks that may be found in a wave file -- many of them duplicating the same information
found in other chunks (but in an unnecessarily different way) simply because there have
been too many programmers who took too many liberties with unilaterally adding their own
additions to the wave format without properly coming to a concensus of what everyone else
needed (and therefore it encouraged an &quot;every man for himself&quot; attitude toward
adding things to this &quot;standard&quot;). one example is the instrument chunk versus
the sampler chunk. another example is the note versus label chunks in an associated data
list. i don't even want to get into the totally irresponsible proliferation of compressed
formats. (ie, it seems like everyone and his pet dachshound has come up with some
compressed version of storing wave data -- like we need 100 different ways to do that).
furthermore, there are lots of inconsistencies, for example how 8-bit data is unsigned,
but 16-bit data is signed. </p>

<p>i've attempted to document only those aspects that you're very likely to encounter in a
wave file. i suggest that you concentrate upon these and refuse to support the work of
programmers who feel the need to deviate from a standard with inconsistent, proprietary,
self-serving, unnecessary extensions. please do your part to rein in half-ass programming.</p>

<h3>sample points and sample frames</h3>

<p>a large part of interpreting wave files revolves around the two concepts of sample
points and sample frames. </p>

<p>a sample point is a value representing a sample of a sound at a given moment in time.
for waveforms with greater than 8-bit resolution, each sample point is stored as a linear,
2's-complement value which may be from 9 to 32 bits wide (as determined by the
wbitspersample field in the format chunk, assuming pcm format -- an uncompressed format).
for example, each sample point of a 16-bit waveform would be a 16-bit word (ie, two 8-bit
bytes) where 32767 (0x7fff) is the highest value and -32768 (0x8000) is the lowest value.
for 8-bit (or less) waveforms, each sample point is a linear, unsigned byte where 255 is
the highest value and 0 is the lowest value. obviously, this signed/unsigned sample point
discrepancy between 8-bit and larger resolution waveforms was one of those
&quot;oops&quot; scenarios where some microsoft employee decided to change the sign
sometime after 8-bit wave files were common but 16-bit wave files hadn't yet appeared.</p>

<p>because most cpu's read and write operations deal with 8-bit bytes, it was decided that
a sample point should be rounded up to a size which is a multiple of 8 when stored in a
wave. this makes the wave easier to read into memory. if your adc produces a sample point
from 1 to 8 bits wide, a sample point should be stored in a wave as an 8-bit byte (ie,
unsigned char). if your adc produces a sample point from 9 to 16 bits wide, a sample point
should be stored in a wave as a 16-bit word (ie, signed short). if your adc produces a
sample point from 17 to 24 bits wide, a sample point should be stored in a wave as three
bytes. if your adc produces a sample point from 25 to 32 bits wide, a sample point should
be stored in a wave as a 32-bit doubleword (ie, signed long). etc.</p>

<p>furthermore, the data bits should be left-justified, with any remaining (ie, pad) bits
zeroed. for example, consider the case of a 12-bit sample point. it has 12 bits, so the
sample point must be saved as a 16-bit word. those 12 bits should be left-justified so
that they become bits 4 to 15 inclusive, and bits 0 to 3 should be set to zero. shown
below is how a 12-bit sample point with a value of binary 101000010111 is formatted
left-justified as a 16-bit word.</p>

<pre>
 ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |   |   |   |   |   |   |   |   |
| 1   0   1   0   0   0   0   1   0   1   1   1   0   0   0   0 |
|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|___|
 <---------------------------------------------> <------------->
    12 bit sample point is left justified          rightmost
                                                  4 bits are
                                                  zero padded
</pre>

<p>but note that, because the wave format uses intel little endian byte order, the lsb is
stored first in the wave file as so: </p>

<pre>
 ___ ___ ___ ___ ___ ___ ___ ___    ___ ___ ___ ___ ___ ___ ___ ___
|   |	  |   |   |   |	|   |   |  |   |   |   |   |   |   |   |   |
| 0   1   1   1   0   0   0   0 |  | 1   0   1   0   0   0   0   1 |
|___|___|___|___|___|___|___|___|  |___|___|___|___|___|___|___|___|
 <-------------> <------------->    <----------------------------->
   bits 0 to 3     4 pad bits                 bits 4 to 11
</pre>

<p>for multichannel sounds (for example, a stereo waveform), single sample points from
each channel are interleaved. for example, assume a stereo (ie, 2 channel) waveform.
instead of storing all of the sample points for the left channel first, and then storing
all of the sample points for the right channel next, you &quot;mix&quot; the two channels'
sample points together. you would store the first sample point of the left channel. next,
you would store the first sample point of the right channel. next, you would store the
second sample point of the left channel. next, you would store the second sample point of
the right channel, and so on, alternating between storing the next sample point of each
channel. this is what is meant by interleaved data; you store the next sample point of
each of the channels in turn, so that the sample points that are meant to be
&quot;played&quot; (ie, sent to a dac) simultaneously are stored contiguously. </p>

<p>the sample points that are meant to be &quot;played&quot; (ie, sent to a dac)
simultaneously are collectively called a <b>sample frame</b>. in the example of our stereo
waveform, every two sample points makes up another sample frame. this is illustrated below
for that stereo example. </p>

<pre>
  sample       sample              sample
  frame 0      frame 1             frame n
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|
</pre>

<p>for a monophonic waveform, a sample frame is merely a single sample point (ie, there's
nothing to interleave). for multichannel waveforms, you should follow the conventions
shown below for which order to store channels within the sample frame. (ie, below, a
single sample frame is displayed for each example of a multichannel waveform). </p>

<pre>
  channels       1         2
             _________ _________ 
            | left    | right   |
  stereo    |         |         |
            |_________|_________|


                 1         2         3
             _________ _________ _________ 
            | left    | right   | center  |
  3 channel |         |         |         |
            |_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | front   | front   | rear    | rear    |
  quad      | left    | right   | left    | right   |
            |_________|_________|_________|_________|

                 1         2         3         4
             _________ _________ _________ _________ 
            | left    | center  | right   | surround|
  4 channel |         |         |         |         |
            |_________|_________|_________|_________|

                 1         2         3         4         5         6
             _________ _________ _________ _________ _________ _________
            | left    | left    | center  | right   | right   |surround |
  6 channel | center  |         |         | center  |         |         |
            |_________|_________|_________|_________|_________|_________|
</pre>

<p>the sample points within a sample frame are packed together; there are no unused bytes
between them. likewise, the sample frames are packed together with no pad bytes. </p>

<p>note that the above discussion outlines the format of data within an uncompressed data
chunk. there are some techniques of storing compressed data in a data chunk. obviously,
that data would need to be uncompressed, and then it will adhere to the above layout.</p>

<hr>

<h3>the format chunk</h3>

<p>the format (fmt) chunk describes fundamental parameters of the waveform data such as
sample rate, bit resolution, and how many channels of digital audio are stored in the
wave. </p>

<pre>
#define formatid 'fmt '   /* chunkid for format chunk. note: there is a space at the end of this id. */

typedef struct {
12 3 4 5 下一页
💿 文件大小 23 K
👤 上传用户 mmmmmmmmmxxx
📂 所属分类文章/文档
🏷️ 相关标签

#wave #文件结构
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -