📄 julius.txt

📁 about sound recognition.i want to downlod
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
JULIUS(1)                                                            JULIUS(1)



NAME
       Julius - open source multi-purpose LVCSR engine

SYNOPSIS
       julius [-C jconffile] [options ...]

DESCRIPTION
       Julius  is  a  high-performance, multi-purpose, free speech recognition
       engine for researchers and developers.  It  is  capable  of  performing
       almost  real-time  recognition  of continuous speech with over 60k-word
       vocabulary on most current PCs.

       Julius needs an N-gram language model, word dictionary and an  acoustic
       model  to execute a recognition.  Standard model formats (i.e. ARPA and
       HTK) with any word/phone units and sizes are supported,  so  users  can
       build  a recognition system for various target using their own language
       model and acoustic models.  For details about basic  models  and  their
       availability, please see the documents contained in this package.

       Julius  can  perform recognition on audio files, live microphone input,
       network input and feature parameter files.  The maximum size of vocabu-
       lary is 65,535 words.

RECOGNITION MODELS
       Julius supports the following models.

       Acoustic Models
                 Sub-word  HMM  (Hidden  Markov Model) in HTK ascii format are
                 supported.  Phoneme  models  (monophone),  context  dependent
                 phoneme  models  (triphone),  tied-mixture and phonetic tied-
                 mixture models of any unit can be used.  When  using  context
                 dependent models, interword context is also handled.  You can
                 further use a tool mkbinhmm to convert the ascii HMM  defini-
                 tion file to binary format, for speeding up the startup (this
                 format is incompatible with that of HTK).

       Language model
                 2-gram and reverse 3-gram  language  models  are  used.   The
                 Standard  ARPA  format  is  supported.  In addition, a binary
                 format N-gram is also supported  for  efficiency.   The  tool
                 mkbingram.   can convert binary N-gram from the ARPA language
                 models.

SPEECH INPUT
       Both live speech input and recorded speech file  input  are  supported.
       Live input stream from microphone device, DatLink (NetAudio) device and
       tcpip network input using adintool is supported.  Speech waveform files
       (16bit WAV (no compression), RAW format, and many other formats will be
       acceptable if compiled with  libsndfile  library).   Feature  parameter
       files in HTK format are also supported.

       Note  that  Julius  itself  can only extract MFCC_E_D_N_Z features from
       speech data.  If you use an acoustic HMM trained by other feature type,
       only the HTK parameter file of the same feature type can be used.

SEARCH ALGORITHM OVERVIEW
       Recognition  algorithm of Julius is based on a two-pass strategy.  Word
       2-gram and reverse word 3-gram is used on the respective  passes.   The
       entire  input  is  processed  on  the  first  pass, and again the final
       searching process is performed again for the input, using the result of
       the  first pass to narrow the search space.  Specifically, the recogni-
       tion algorithm is based on a  tree-trellis  heuristic  search  combined
       with  left-to-right  frame-synchronous  beam  search  and right-to-left
       stack decoding search.

       When using context dependent phones (triphones), interword contexts are
       taken  into  consideration.  For tied-mixture and phonetic tied-mixture
       models, high-speed acoustic likelihood calculation  is  possible  using
       gaussian pruning.

       For more details, see the related document or web page below.

OPTIONS
       The  options  below  specify  the  models, system behaviors and various
       search parameters.  These option can be set all at once at the  command
       line,  but  it  is  recommended that you write them in a text file as a
       "jconf file", and specify the file with "-C" option.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech data input source.  'rawfile'  is  waveform  file,
              and specified after startup from stdin).  'mic' means microphone
              device, and 'adinnet' means receiving waveform  data  via  tcpip
              network   from   an   adinnet   client.   'netaudio'   is   from
              DatLink/NetAudio input, and 'stdin' means data input from  stan-
              dard input.

              WAV  (no  compression)  and RAW (noheader, 16bit, BigEndian) are
              supported for waveform file input.  Other  format  can  be  sup-
              ported  using  external library.  To see what format is actually
              supported, see the help message using option "-help".  For stdin
              input, only WAV and RAW is supported.
              (default: mfcfile)

       -filelist file
              (With  -input  rawfile|mfcfile) perform recognition on all files
              listed in the file.

       -adport portnum
              (With -input adinnet) adinnet port number (default: 5530)

       -NA server:unit
              (With -input netaudio) set the server name and unit  ID  of  the
              Datlink unit.

       -zmean  -nozmean
              This  option  enables/disables  DC offset removal of input wave-
              form.  For speech file input, zero mean will  be  computed  from
              the  whole  input.  For microphone / network input, zero mean of
              the first 48000 samples (3 seconds in 16kHz  sampling)  will  be
              used at the rest.  (default: disabled (-nozmean))

       -zmeanframe  -nozmeanframe
              With  speech  input,  this option enables/disables frame-wise DC
              offset removal. This is the same as  HTK's  ZMEANSOURCE  option,
              and  cannot be set with "-zmean".  (default: disabled (-nozmean-
              frame))

       -nostrip
              Julius by default removes zero samples in input speech data.  In
              some  cases,  such  invalid data may be recorded at the start or
              end of recording.  This option inhibit this automatic removal.

       -record directory
              Auto-save input speech data successively  under  the  directory.
              Each  segmented  inputs are recorded to a file each by one.  The
              file name of the recorded data is  generated  from  system  time
              when  the  input  starts,  in a style of "YYYY.MMDD.HHMMSS.wav".
              File format is 16bit monoral WAV.  Invalid  for  mfcfile  input.
              With  input rejection by "-rejectshort", the rejected input will
              also be recorded even if they are rejected.

       -rejectshort msec
              Reject input shorter than specified milliseconds.   Search  will
              be  terminated  and  no  result will be output.  In module mode,
              '<REJECTED REASON="..."/>' message will be sent to client.  With
              "-record", the rejected input will also be recorded even if they
              are rejected.  (default: 0 = off)

   Speech Detection
       Options in this section is invalid for mfcfile input.

       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)  to  ON/OFF.
              (default: ON for mic/adinnet, OFF for files)

       -lv threslevel
              Level  threshold  (0  -  32767) for speech triggering.  If audio
              input amplitude goes over this threshold for  a  period,  Julius
              begin  the  1st  pass recognition.  If the level goes below this
              level after triggering, it is the end  of  the  speech  segment.
              (default: 2000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start of speech segment in milliseconds. (default:
              300)

       -tailmargin msec
              Margin at the end of speech segment in  milliseconds.  (default:
              400)

   Acoustic Analysis
       -smpFreq frequency
              Set sampling frequency of input speech in Hz.  Sampling rate can
              also be specified using "-smpPeriod".  Be careful that this fre-
              quency  should be the same as the trained conditions of acoustic
              model you use.  This should be specified  for  microphone  input
              and RAW file input when using other than default rate.  Also see
              "-fsize", "-fshift", "-delwin" and "-accwin".
              (default: 16000 (Hz) = 625ns).

       -smpPeriod period
              Set sampling frequency of input speech by  its  sampling  period
              (nanoseconds).   The  sampling  rate can also be specified using
              "-smpFreq".  Be careful that the input frequency should  be  the
              same  as  the trained conditions of acoustic model you use. This
              should be specified for microphone input and RAW file input when
              using  other  than  default rate.  Also see "-fsize", "-fshift",
              "-delwin" and "-accwin".
              (default: 625 (ns) = 16000Hz).

       -fsize sample
              Analysis window size in number of samples. (default: 400).

       -fshift sample
              Frame shift in number of samples (default: 160).

       -preemph value
              Pre-emphasis coefficient (default: 0.97)

       -fbank num
              Number of filterbank channels (default: 24)

       -ceplif num
              Cepstral liftering coefficient (default: 22)

       -rawe / -norawe
              Enable/disable using raw energy  before  pre-emphasis  (default:
              disabled)

       -enormal / -nornormal
              Enable/disable   normalizing  log  energy  (default:  disabled).
              Note: normalising log energy should not  be  specified  on  live
              input,  at  both  training and recognition (see sec. 5.9 "Direct
              Audio Input/Output" in HTKBook).

       -escale value
              Scaling  factor  of  log  energy  when  normalizing  log  energy
              (default: 1.0)

       -silfloor value
              Energy silence floor in dB when normalizing log energy (default:
              50.0)

       -delwin frame
              Delta window size in number of frames (default: 2).

       -accwin frame
              Acceleration window size in number of frames (default: 2).

       -lofreq frequency
              Enable band-limiting for MFCC filterbank computation: set  lower
              frequency cut-off.  Also see "-hifreq".
              (default: -1 = disabled)

       -hifreq frequency
              Enable  band-limiting for MFCC filterbank computation: set upper
              frequency cut-off.  Also see "-lofreq".
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction using head part of each file.  With
              this  option,  Julius assume there are certain length of silence
              at each input file.  Valid only  for  rawfile  input.   Conflict
              with "-ssload".

       -sscalclen
              With  "-sscalc", specify the length of head part silence in mil-
              liseconds (default: 300)

       -ssload filename
              Perform spectral subtraction for speech  input  using  pre-esti-
              mated  noise spectrum from file.  The noise spectrum data should
              be computed beforehand by mkss.  Valid  for  all  speech  input.
              Conflict with "-sscalc".

       -ssalpha value
              Alpha  coefficient  of  spectral  subtraction  for "-sscals" and
              "-ssload".  Noise will be subtracted stronger as this value gets
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -