📄 julian.txt

📁 about sound recognition.i want to downlod
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
JULIAN(1)                                                            JULIAN(1)



NAME
       Julian - grammar based continuous speech recognition parser

SYNOPSIS
       julian [-C jconffile] [options ...]

DESCRIPTION
       Julian  is  a  high-performance, multi-purpose, free speech recognition
       parser based on finite state grammar.   It  is  capable  of  performing
       real-time  recognition  of  continuous  speech  with  over thousands of
       vocabulary.

       Julian is a derived version of Julius, and almost  all  components  are
       the same except language model related part.

       To execute a recognition, it needs an acoustic model and a finite state
       grammar that describes sentence patterns to be recognized.  The grammar
       format  is  an  original one, and tools to create a recognirion grammar
       are included in the distribution.  For acoustic model, standard  format
       (i.e. HTK) with any word/phone units and sizes are supported.  So users
       can build a recognition system customized for specific tasks using  own
       task  grammar and acoustic models.  For details about models and how to
       write a grammar, please see the documents contained in this package.

       Julian can perform recognition on audio files, live  microphone  input,
       network input and feature parameter files.  The maximum size of vocabu-
       lary is 65,535 words.

RECOGNITION MODELS
       Julian supports the following models.

       Acoustic Models
                 Same as Julius: Sub-word HMM (Hidden  Markov  Model)  in  HTK
                 format  are  supported.   Phoneme models (monophone), context
                 dependent phoneme models (triphone),  tied-mixture  and  pho-
                 netic  tied-mixture  models  of  any  unit can be used.  When
                 using context dependent models,  interword  context  is  also
                 handled.   You can further use a tool mkbinhmm to convert the
                 ascii HMM definition file to binary format, for  speeding  up
                 the startup (this format is incompatible with that of HTK).

       Language model
                 The  grammar format is an original one, and tools to create a
                 recognirion grammar are  included  in  the  distribution.   A
                 grammar  consists  of two files: one is a 'grammar' file that
                 describes sentence structures in  a  BNF  style,  using  word
                 'categories'  as terminate symbols.  Another is a 'voca' file
                 that defines  word  with  its  pronunciations  (i.e.  phoneme
                 sequences)  for  each  category.  They should be converted by
                 mkdfa.pl(1) to a deterministic finite automaton  file  (.dfa)
                 and a dictionary file (.dict), respectively.

SPEECH INPUT
       Same  as  Julius: Both live speech input and recorded speech file input
       are supported.  Live  input  stream  from  microphone  device,  DatLink
       (NetAudio)  device and tcpip network input using adintool is supported.
       Speech waveform files (16bit WAV (no compression), RAW format, and many
       other  format  will be acceptable if compiled with libsndfile library).
       Feature parameter files in HTK format are also supported.

       Note that Julian itself can only  extract  MFCC_E_D_N_Z  features  from
       speech data.  If you use an acoustic HMM trained by other feature type,
       only the HTK parameter file of the same feature type can be used.

SEARCH ALGORITHM OVERVIEW
       Recognition algorithm of Julian is based on a  two-pass  strategy.   In
       the  first  pass,  a  high-speed  approximate search is performed using
       weaker constraints then the given grammar.  Here a LR beam search using
       only  inter-category  constraints  extracted  from  the grammar is per-
       formed. The second pass re-searches the input, using the original gram-
       mar  rules and intermediate results from the first pass, to gain a high
       precision result quickly.  In the second pass the optimal  solution  is
       theoretically guaranteed using the A* search.

       When using context dependent phones (triphones), interword contexts are
       taken into consideration.  For tied-mixture and  phonetic  tied-mixture
       models,  high-speed  acoustic  likelihood calculation is possible using
       gaussian pruning.

       For more details, see the related document or web page below.

OPTIONS
       The options below specify the  models,  system  behaviors  and  various
       search  parameters.  These option can be set all at once at the command
       line, but it is recommended that you write them in a  text  file  as  a
       "jconf file", and specify the file with "-C" option.

       Most are the same as Julius.
       Options  only  in Julian: -gram, -gramlist, -dfa, -penalty1, -penalty2,
       -looktrellis
       Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2, -transp, -silhead,
       -siltail, -spdur, -sepnum, -separatescore


   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select  speech  data  input source.  'rawfile' is waveform file,
              and specified after startup from stdin).  'mic' means microphone
              device,  and  'adinnet'  means receiving waveform data via tcpip
              network   from   an   adinnet   client.   'netaudio'   is   from
              DatLink/NetAudio  input, and 'stdin' means data input from stan-
              dard input.

              WAV (no compression) and RAW (noheader,  16bit,  BigEndian)  are
              supported  for  waveform  file  input.  Other format can be sup-
              ported using external library.  To see what format  is  actually
              supported, see the help message using option "-help".  For stdin
              input, only WAV and RAW is supported.
              (default: mfcfile)

       -filelist file
              (With -input rawfile|mfcfile) perform recognition on  all  files
              listed in the file.

       -adport portnum
              (with -input adinnet) adinnet port number (default: 5530)

       -NA server:unit
              (with  -input  netaudio)  set the server name and unit ID of the
              Datlink unit.

       -zmean  -nozmean
              This option enables/disables DC offset removal  of  input  wave-
              form.   For  speech  file input, zero mean will be computed from
              the whole input.  For microphone / network input, zero  mean  of
              the  first  48000  samples (3 seconds in 16kHz sampling) will be
              used at the rest.  (default: disabled (-nozmean))

       -zmeanframe  -nozmeanframe
              With speech input, this option  enables/disables  frame-wise  DC
              offset  removal.  This  is the same as HTK's ZMEANSOURCE option,
              and cannot be set with "-zmean".  (default: disabled  (-nozmean-
              frame))

       -nostrip
              Julian by default removes zero samples in input speech data.  In
              some cases, such invalid data may be recorded at  the  start  or
              end of recording.  This option inhibit this automatic removal.

       -record directory
              Auto-save  input  speech  data successively under the directory.
              Each segmented inputs are recorded to a file each by  one.   The
              file  name  of  the  recorded data is generated from system time
              when the input starts, in  a  style  of  "YYYY.MMDD.HHMMSS.wav".
              File  format  is  16bit monoral WAV.  Invalid for mfcfile input.
              With input rejection by "-rejectshort", the rejected input  will
              also be recorded even if they are rejected.

       -rejectshort msec
              Reject  input  shorter than specified milliseconds.  Search will
              be terminated and no result will be  output.   In  module  mode,
              '<REJECTED REASON="..."/>' message will be sent to client.  With
              "-record", the rejected input will also be recorded even if they
              are rejected.  (default: 0 = off)

   Speech Detection
       Options in this section is invalid for mfcfile input.

       -cutsilence

       -nocutsilence
              Force  silence  cutting  (=speech  segment detection) to ON/OFF.
              (default: ON for mic/adinnet, OFF for files)

       -lv threslevel
              Level threshold (0 - 32767) for  speech  triggering.   If  audio
              input  amplitude  goes  over this threshold for a period, Julius
              begin the 1st pass recognition.  If the level  goes  below  this
              level  after  triggering,  it  is the end of the speech segment.
              (default: 2000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start  of  the  speech  segment  in  milliseconds.
              (default: 300)

       -tailmargin msec
              Margin  at  the  end  of  the  speech  segment  in milliseconds.
              (default: 400)

   Acoustic Analysis
       -smpFreq frequency
              Set sampling frequency of input speech in Hz.  Sampling rate can
              also be specified using "-smpPeriod".  Be careful that this fre-
              quency should be the same as the trained conditions of  acoustic
              model  you  use.   This should be specified for microphone input
              and RAW file input when using other than default rate.  Also see
              "-fsize", "-fshift", "-delwin" and "-accwin".
              (default: 16000 (Hz = 625ns))

       -smpPeriod period
              Set  sampling  frequency  of input speech by its sampling period
              (nanoseconds).  The sampling rate can also  be  specified  using
              "-smpFreq".   Be  careful that the input frequency should be the
              same as the trained conditions of acoustic model you  use.  This
              should be specified for microphone input and RAW file input when
              using other than default rate.  Also  see  "-fsize",  "-fshift",
              "-delwin" and "-accwin".
              (default: 625 (ns = 16000Hz))

       -fsize sample
              Analysis window size in number of samples. (default: 400).

       -fshift sample
              Frame shift in number of samples (default: 160).

       -preemph value
              Pre-emphasis coefficient (default: 0.97)

       -fbank num
              Number of filterbank channels (default: 24)

       -ceplif num
              Cepstral liftering coefficient (default: 22)

       -rawe / -norawe
              Enable/disable  using  raw  energy before pre-emphasis (default:
              disabled)

       -enormal / -nornormal
              Enable/disable  normalizing  log  energy  (default:   disabled).
              Note:  normalising  log  energy  should not be specified on live
              input, at both training and recognition (see  sec.  5.9  "Direct
              Audio Input/Output" in HTKBook).

       -escale value
              Scaling  factor  of  log  energy  when  normalizing  log  energy
              (default: 1.0)

       -silfloor value
              Energy silence floor in dB when normalizing log energy (default:
              50.0)

       -delwin frame
              Delta window size in number of frames (default: 2).

       -accwin frame
              Acceleration window size in number of frames (default: 2).

       -lofreq frequency
              Enable  band-limiting for MFCC filterbank computation: set lower
              frequency cut-off.
              (default: -1 = disabled)

       -hifreq frequency
              Enable band-limiting for MFCC filterbank computation: set  upper
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -