📄 labels.tex
字号:
%/* ----------------------------------------------------------- */%/* */%/* ___ */%/* |_| | |_/ SPEECH */%/* | | | | \ RECOGNITION */%/* ========= SOFTWARE */ %/* */%/* */%/* ----------------------------------------------------------- */%/* Copyright: Microsoft Corporation */%/* 1995-2000 Redmond, Washington USA */%/* http://www.microsoft.com */%/* */%/* Use of this software is governed by a License Agreement */%/* ** See the file License for the Conditions of Use ** */%/* ** This banner notice must not be removed ** */%/* */%/* ----------------------------------------------------------- */%% HTKBook - Steve Young 1/12/97%\mychap{Transcriptions and Label Files}{labels}\sidepic{Tool.labs}{80}{}Many of the operations performed by \HTK\ which involve speech datafiles assume that the speech is divided into segments and each segmenthas a name or \textit{label}. The set of labels associated with aspeech file constitute a \textit{transcription} and each transcription isstored in a separate \textit{label file}. Typically, the name of thelabel file will be the same as the corresponding speech file but witha different extension. For convenience, label files are often storedin a separate directory and all \HTK\ tools have an option to specifythis. When very large numbers of files are being processing, labelfile access can be greatly facilitated by using\index{master label files}\textit{Master Label Files (MLFs)}. MLFs may be regarded as index\index{MLF}files holding pointers to the actual label files which can either beembedded in the same index file or stored anywhere else in the file system.Thus, MLFs allow large sets of files to be stored in a single file, theyallow a single transcription to be shared by many logical label filesand they allow arbitrary file redirection.\index{label files}The \HTK\ interface to label files is provided by the module \htool{HLabel}which implements the MLF facility and support for a number of externallabel file formats. All of the facilities supplied by \htool{HLabel}, including thesupported label file formats, are described in this chapter.In addition, \HTK\ provides a tool called \htool{HLEd} for simple batch editing of label files and this is also described.Before proceeding to the details, however, the general structure oflabel files will be reviewed.\mysect{Label File Structure}{labstruct}Most transcriptions are single-alternative and single-level, that isto say, the associated speech file is described by a single sequenceof labelled segments. Most standard label formats are of this kind.Sometimes, however, it is useful to have several levels of labels associatedwith the same basic segment sequence. For example, in training a HMMsystem it is useful to have both the word level transcriptions and thephone level transcriptions \textit{side-by-side}. \index{labels!side-by-side}Orthogonal to the requirement for multiple levels of description,a transcription may also need to include multiple alternativedescriptions of the same speech file. For example, the outputof a speech recogniser may be in the form of an \textit{N-best} listwhere each word sequence in the list represents one possible interpretationof the input.\index{labels!multiple level}As an example, Fig.~\href{f:labegs} shows a speech file and threedifferent ways in which it might be labelled. In part (a), just a simpleorthography is given and this single-level single-alternative type oftranscription is the commonest case. Part (b) shows a 2-leveltranscription where the basic level consists of a sequence of phones buta higher level of word labels are also provided. Notice that there is adistinction between the basic level and the higher levels, since only thebasic level has explicit boundary locations marked for every segment. The higher levels do not have explicit boundary information since thiscan always be inferred from the basic level boundaries. Finally, part (c)shows the case where knowledge of the contents of the speech file isuncertain and three possible word sequences are given.\HTK\ label files support multiple-alternative and multiple-leveltranscriptions. In addition to start and end times on the basic level, alabel at any level may also have a score associated with it. When atranscription is loaded, all but one specific alternative can be discarded bysetting the configuration variable \texttt{TRANSALT}\index{transalt@\texttt{TRANSALT}} to the requiredalternative \texttt{N}, where the first (i.e. normal) alternative is numbered1. Similarly, all but a specified level can be discarded by setting theconfiguration variable \texttt{TRANSLEV}\index{translev@\texttt{TRANSLEV}} to the required level number where again the first (i.e.normal) level is numbered 1.All non-\HTK\ formats are limited tosingle-level single-alternative transcriptions.\mysect{Label File Formats}{labform}As with speech data files, \HTK\ not only defines its own format forlabel files but also supports a number of external formats. Definingan external format is similar to the case for speech data files exceptthat the relevant configuration variables for specifying a formatother than \HTK\ are called \texttt{SOURCELABEL}\index{sourcelabel@\texttt{SOURCELABEL}} and \texttt{TARGETLABEL}.The source label format can also be specified using the \texttt{-G}\index{standard options!aaag@\texttt{-G}} commandline option. As with using the \texttt{-F}\index{standard options!aaaf@\texttt{-F}} commandline option for speech data files, the \texttt{-G} option overrides anysetting of \texttt{SOURCELABEL}\index{labels!external formats}\subsection{HTK Label Files}The \HTK\ label format is text based. As noted above, a single labelfile can contain multiple-alternatives and multiple-levels.Each line of a \HTK\ label file contains\index{label files!HTK format}the actual label optionally preceded by start and end times, andoptionally followed by a match score. \begin{verbatim} [start [end] ] name [score] { auxname [auxscore] } [comment]\end{verbatim}where \texttt{start} denotes the start time of the labelled segmentin 100ns units, \texttt{end}denotes the end time in 100ns units, \texttt{name} is the nameof the segment and \texttt{score} is a floating point confidence score.All fields except the name are optional. If \texttt{end} is omitted thenit is set equal to -1 and ignored. This case would occur with data which hadbeen labelled frame synchronously. If \texttt{start} and \texttt{end} are bothmissing then both are set to -1 and the label file is treated as a simple symbolic transcription. Theoptional score would typically be a log probability generated by a recognition tool. When omitted the score is set to 0.0.The following example corresponds to the transcription shownin part (a) of Fig.~\ref{f:labegs}\begin{verbatim} 0000000 3600000 ice 3600000 8200000 cream\end{verbatim}Multiple levels are described by adding further names alongsidethe basic name. The lowest level (shortest segments) should begiven first since only the lowest level has start and end times.The label file corresponding to the transcription illustrated inpart (b) of Fig.~\ref{f:labegs} would be as follows.\begin{verbatim} 0000000 2200000 ay ice 2200000 3600000 s 3600000 4300000 k cream 4300000 5000000 r 5000000 7400000 iy 7400000 8200000 m\end{verbatim}Finally, multiple alternatives are written as a sequence of separatelabel lists separated by three slashes (///).The label file corresponding to the transcription illustrated inpart (c) of Fig.~\ref{f:labegs} would therefore be as follows.\begin{verbatim} 0000000 2200000 I 2200000 8200000 scream /// 0000000 3600000 ice 3600000 8200000 cream /// 0000000 3600000 eyes 3600000 8200000 cream\end{verbatim}Actual label names can be any sequence of characters.However, the \texttt{-} and \texttt{+} characters are reserved for identifyingthe left and right context\index{labels!context markers}, respectively, in a context-dependent phonelabel. For example, the label \texttt{N-aa+V} might be used to denotethe phone \texttt{aa} when preceded by a nasal and followed by a vowel.These context-dependency conventions are used in the label editor \htool{HLEd},and are understood by all \HTK\ tools.\subsection{ESPS Label Files}An \ESPSwaves\ label file is a text file with one label stored perline. Each label indicates a segment boundary. \index{label files!ESPS format}A complete descriptionof the \ESPSwaves\ label format is given in the \ESPSwaves\ manual pages {\bf xwaves (1-ESPS)} and {\bf xlabel (1-ESPS)}.Only details required for use with \HTK\ are given here. The label data followsa header which ends with a line containing onlya \texttt{\#}. The header contents are generally ignored by \htool{HLabel}.The labels follow the header in the form\begin{verbatim} time ccode name \end{verbatim}where \texttt{time} is a floating point number which denotes the boundarylocation in seconds,\texttt{ccode} is an integer color map entry used by \ESPSwaves\ in drawingsegment boundaries and \texttt{name} is the name of the segment boundary. Atypical value for \texttt{ccode} is \texttt{121}.While each \HTK\ label can contain both a start and an end time whichindicatethe boundaries of a labeled segment, \ESPSwaves\ labelscontain a single time in seconds which (by convention) refers to the end of the labeled segment. The starting timeof the segment is taken to be the end of the previoussegment and \texttt{0} initially.\ESPSwaves\ label files may have several boundary names per line.However, \htool{HLabel} only reads \ESPSwaves\ label files with a single nameper boundary. Multiple-alternative and/or multiple-level \HTK\ labeldata structures cannot be saved using \ESPSwaves\ format label files.\subsection{TIMIT Label Files}\index{label files!TIMIT format}TIMIT label files are identical to single-alternative single-level HTKlabel files without scores except that the start and end times aregiven as sample numbers rather than absolute times. TIMIT label filesare used on both the prototype and final versions of the TIMIT CD ROM.\subsection{SCRIBE Label Files}\index{label files!SCRIBE format}The SCRIBE label file format is a subset of the European SAM label file format.SAM label files are text files and each line begins with a label identifyingthe type of information stored on that line. The \HTK\ SCRIBE format recognisesjust three label types\begin{tabbing}++ \= +++++++ \= \kill\> LBA \>-- acoustic label \\\> LBB \>-- broad class label \\\> UTS \>-- utterance \end{tabbing}For each of these, the rest of the line is divided into comma separatedfields. The LBA and LBB types have 4 fields: start sample, centre sample, end sampleand label. \HTK\ expects the centre sample to be blank. The UTS type has 3 fields:start sample, end sample and label. UTS labels may be multi-word since they canrefer to a complete utterance. In order to make such labels usable within \HTK\ tools,between word blanks are converted to underscore characters. The \texttt{EX}\index{ex@\texttt{EX} command} commandin the \HTK\ label editor \htool{HLEd} can then be used to split sucha compound label into individual word labels if required.\mysect{Master Label Files}{mlfs}\subsection{General Principles of MLFs}\index{master label files}Logically, the organisation of data and label files is very simple.Every data file has a label file of the same name (butdifferent extension) which is either stored in the same directory asthe data file or in some other specified directory.\sidefig{labegs}{60}{Example Transcriptions}{2}{}This scheme is sufficient for most needs and commendably simple.However, there are many cases where either it makes unnecessarilyinefficient use of the operating system or it seriously inconveniencesthe user. For example, to use a training tool withisolated word data may require the generation of hundreds or thousands oflabel files each having just one label entry. Even where individuallabel files are appropriate (as in the phonetically transcribed TIMITdatabase), each label file must bestored in the same directory as the data file it transcribes, or all label filesmust be stored in the same directory. One cannot, for example, have adifferent directory of label files for each TIMIT dialect region andthen run the \HTK\ training tool \htool{HERest} on the whole database.All of these problems can be solved by the use of Master Label Files(MLFs). Every \HTK\ tool which uses label files has a \texttt{-I}\index{standard options!aaai@\texttt{-I}} optionwhich can be used to specify the name of an MLF file. When an MLF has beenloaded, the normal rules for locating a label file apply except thatthe MLF is searched first. If the required label file \texttt{f} is found via
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -