📄 id3v2.3.0.txt
字号:
the beginning of every track on the CD should be described with a
four bytes absolute CD-frame address per track, and not with absolute
time. This frame requires a present and valid "TRCK" frame, even if
the CD's only got one track. There may only be one "MCDI" frame in
each tag.
<Header for 'Music CD identifier', ID: "MCDI">
CD TOC <binary data>
4.6. Event timing codes
This frame allows synchronisation with key events in a song or sound.
The header is:
<Header for 'Event timing codes', ID: "ETCO">
Time stamp format $xx
Where time stamp format is:
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
$02 Absolute time, 32 bit sized, using milliseconds as unit
Abolute time means that every stamp contains the time from the
beginning of the file.
Followed by a list of key events in the following format:
Type of event $xx
Time stamp $xx (xx ...)
The 'Time stamp' is set to zero if directly at the beginning of the
sound or after the previous event. All events should be sorted in
chronological order. The type of event is as follows:
$00 padding (has no meaning)
$01 end of initial silence
$02 intro start
$03 mainpart start
$04 outro start
$05 outro end
$06 verse start
$07 refrain start
$08 interlude start
$09 theme start
$0A variation start
$0B key change
$0C time change
$0D momentary unwanted noise (Snap, Crackle & Pop)
$0E sustained noise
$0F sustained noise end
$10 intro end
$11 mainpart end
$12 verse end
$13 refrain end
$14 theme end
$15-$DF reserved for future use
$E0-$EF not predefined sync 0-F
$F0-$FC reserved for future use
$FD audio end (start of silence)
$FE audio file ends
$FF one more byte of events follows (all the following bytes with
the value $FF have the same function)
Terminating the start events such as "intro start" is not required.
The 'Not predefined sync's ($E0-EF) are for user events. You might
want to synchronise your music to something, like setting of an
explosion on-stage, turning on your screensaver etc.
There may only be one "ETCO" frame in each tag.
4.7. MPEG location lookup table
To increase performance and accuracy of jumps within a MPEG [MPEG]
audio file, frames with timecodes in different locations in the file
might be useful. The ID3v2 frame includes references that the
software can use to calculate positions in the file. After the frame
header is a descriptor of how much the 'frame counter' should
increase for every reference. If this value is two then the first
reference points out the second frame, the 2nd reference the 4th
frame, the 3rd reference the 6th frame etc. In a similar way the
'bytes between reference' and 'milliseconds between reference' points
out bytes and milliseconds respectively.
Each reference consists of two parts; a certain number of bits, as
defined in 'bits for bytes deviation', that describes the difference
between what is said in 'bytes between reference' and the reality and
a certain number of bits, as defined in 'bits for milliseconds
deviation', that describes the difference between what is said in
'milliseconds between reference' and the reality. The number of bits
in every reference, i.e. 'bits for bytes deviation'+'bits for
milliseconds deviation', must be a multiple of four. There may only
be one "MLLT" frame in each tag.
<Header for 'Location lookup table', ID: "MLLT">
MPEG frames between reference $xx xx
Bytes between reference $xx xx xx
Milliseconds between reference $xx xx xx
Bits for bytes deviation $xx
Bits for milliseconds dev. $xx
Then for every reference the following data is included;
Deviation in bytes %xxx....
Deviation in milliseconds %xxx....
4.8. Synchronised tempo codes
For a more accurate description of the tempo of a musical piece this
frame might be used. After the header follows one byte describing
which time stamp format should be used. Then follows one or more
tempo codes. Each tempo code consists of one tempo part and one time
part. The tempo is in BPM described with one or two bytes. If the
first byte has the value $FF, one more byte follows, which is added
to the first giving a range from 2 - 510 BPM, since $00 and $01 is
reserved. $00 is used to describe a beat-free time period, which is
not the same as a music-free time period. $01 is used to indicate one
single beat-stroke followed by a beat-free period.
The tempo descriptor is followed by a time stamp. Every time the
tempo in the music changes, a tempo descriptor may indicate this for
the player. All tempo descriptors should be sorted in chronological
order. The first beat-stroke in a time-period is at the same time as
the beat description occurs. There may only be one "SYTC" frame in
each tag.
<Header for 'Synchronised tempo codes', ID: "SYTC">
Time stamp format $xx
Tempo data <binary data>
Where time stamp format is:
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
$02 Absolute time, 32 bit sized, using milliseconds as unit
Abolute time means that every stamp contains the time from the
beginning of the file.
4.9. Unsychronised lyrics/text transcription
This frame contains the lyrics of the song or a text transcription of
other vocal activities. The head includes an encoding descriptor and
a content descriptor. The body consists of the actual text. The
'Content descriptor' is a terminated string. If no descriptor is
entered, 'Content descriptor' is $00 (00) only. Newline characters
are allowed in the text. There may be more than one 'Unsynchronised
lyrics/text transcription' frame in each tag, but only one with the
same language and content descriptor.
<Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
Text encoding $xx
Language $xx xx xx
Content descriptor <text string according to encoding> $00 (00)
Lyrics/text <full text string according to encoding>
4.10. Synchronised lyrics/text
This is another way of incorporating the words, said or sung lyrics,
in the audio file as text, this time, however, in sync with the
audio. It might also be used to describing events e.g. occurring on a
stage or on the screen in sync with the audio. The header includes a
content descriptor, represented with as terminated textstring. If no
descriptor is entered, 'Content descriptor' is $00 (00) only.
<Header for 'Synchronised lyrics/text', ID: "SYLT">
Text encoding $xx
Language $xx xx xx
Time stamp format $xx
Content type $xx
Content descriptor <text string according to encoding> $00 (00)
Encoding: $00 ISO-8859-1 [ISO-8859-1] character set is used => $00
is sync identifier.
$01 Unicode [UNICODE] character set is used => $00 00 is
sync identifier.
Content type: $00 is other
$01 is lyrics
$02 is text transcription
$03 is movement/part name (e.g. "Adagio")
$04 is events (e.g. "Don Quijote enters the stage")
$05 is chord (e.g. "Bb F Fsus")
$06 is trivia/'pop up' information
Time stamp format is:
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
$02 Absolute time, 32 bit sized, using milliseconds as unit
Abolute time means that every stamp contains the time from the
beginning of the file.
The text that follows the frame header differs from that of the
unsynchronised lyrics/text transcription in one major way. Each
syllable (or whatever size of text is considered to be convenient by
the encoder) is a null terminated string followed by a time stamp
denoting where in the sound file it belongs. Each sync thus has the
following structure:
Terminated text to be synced (typically a syllable)
Sync identifier (terminator to above string) $00 (00)
Time stamp $xx (xx ...)
The 'time stamp' is set to zero or the whole sync is omitted if
located directly at the beginning of the sound. All time stamps
should be sorted in chronological order. The sync can be considered
as a validator of the subsequent string.
Newline ($0A) characters are allowed in all "SYLT" frames and should
be used after every entry (name, event etc.) in a frame with the
content type $03 - $04.
A few considerations regarding whitespace characters: Whitespace
separating words should mark the beginning of a new word, thus
occurring in front of the first syllable of a new word. This is also
valid for new line characters. A syllable followed by a comma should
not be broken apart with a sync (both the syllable and the comma
should be before the sync).
An example: The "USLT" passage
"Strangers in the night" $0A "Exchanging glances"
would be "SYLT" encoded as:
"Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx
" night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx
xx "glan" $00 xx xx "ces" $00 xx xx
There may be more than one "SYLT" frame in each tag, but only one
with the same language and content descriptor.
4.11. Comments
This frame is indended for any kind of full text information that
does not fit in any other frame. It consists of a frame header
followed by encoding, language and content descriptors and is ended
with the actual comment as a text string. Newline characters are
allowed in the comment text string. There may be more than one
comment frame in each tag, but only one with the same language and
content descriptor.
<Header for 'Comment', ID: "COMM">
Text encoding $xx
Language $xx xx xx
Short content descrip. <text string according to encoding> $00 (00)
The actual text <full text string according to encoding>
4.12. Relative volume adjustment
This is a more subjective function than the previous ones. It allows
the user to say how much he wants to increase/decrease the volume on
each channel while the file is played. The purpose is to be able to
align all files to a reference volume, so that you don't have to
change the volume constantly. This frame may also be used to balance
adjust the audio. If the volume peak levels are known then this could
be described with the 'Peak volume right' and 'Peak volume left'
field. If Peakvolume is not known these fields could be left zeroed
or, if no other data follows, be completely omitted. There may only
be one "RVAD" frame in each tag.
<Header for 'Relative volume adjustment', ID: "RVAD">
Increment/decrement %00xxxxxx
Bits used for volume descr. $xx
Relative volume change, right $xx xx (xx ...)
Relative volume change, left $xx xx (xx ...)
Peak volume right $xx xx (xx ...)
Peak volume left $xx xx (xx ...)
In the increment/decrement field bit 0 is used to indicate the right
channel and bit 1 is used to indicate the left channel. 1 is
increment and 0 is decrement.
The 'bits used for volume description' field is normally $10 (16
bits) for MPEG 2 layer I, II and III [MPEG] and MPEG 2.5. This value
may not be $00. The volume is always represented with whole bytes,
padded in the beginning (highest bits) when 'bits used for volume
description' is not a multiple of eight.
This datablock is then optionally followed by a volume definition for
the left and right back channels. If this information is appended to
the frame the first two channels will be treated as front channels.
In the increment/decrement field bit 2 is used to indicate the right
back channel and bit 3 for the left back channel.
Relative volume change, right back $xx xx (xx ...)
Relative volume change, left back $xx xx (xx ...)
Peak volume right back $xx xx (xx ...)
Peak volume left back $xx xx (xx ...)
If the center channel adjustment is present the following is appended
to the existing frame, after the left and right back channels. The
center channel is represented by bit 4 in the increase/decrease
field.
Relative volume change, center $xx xx (xx ...)
Peak volume center $xx xx (xx ...)
If the bass channel adjustment is present the following is appended
to the existing frame, after the center channel. The bass channel is
represented by bit 5 in the increase/decrease field.
Relative volume change, bass $xx xx (xx ...)
Peak volume bass $xx xx (xx ...)
4.13. Equalisation
This is another subjective, alignment frame. It allows the user to
predefine an equalisation curve within the audio file. There may only
be one "EQUA" frame in each tag.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -