📄 julian.txt
字号:
JULIAN(1) JULIAN(1)
NAME
Julian - grammar based continuous speech recognition parser
SYNOPSIS
julian [-C jconffile] [options ...]
DESCRIPTION
Julian is a high-performance, multi-purpose, free speech recognition
parser based on finite state grammar. It is capable of performing
real-time recognition of continuous speech with over thousands of
vocabulary.
Julian is a derived version of Julius, and almost all components are
the same except language model related part.
To execute a recognition, it needs an acoustic model and a finite state
grammar that describes sentence patterns to be recognized. The grammar
format is an original one, and tools to create a recognirion grammar
are included in the distribution. For acoustic model, standard format
(i.e. HTK) with any word/phone units and sizes are supported. So users
can build a recognition system customized for specific tasks using own
task grammar and acoustic models. For details about models and how to
write a grammar, please see the documents contained in this package.
Julian can perform recognition on audio files, live microphone input,
network input and feature parameter files. The maximum size of vocabu-
lary is 65,535 words.
RECOGNITION MODELS
Julian supports the following models.
Acoustic Models
Same as Julius: Sub-word HMM (Hidden Markov Model) in HTK
format are supported. Phoneme models (monophone), context
dependent phoneme models (triphone), tied-mixture and pho-
netic tied-mixture models of any unit can be used. When
using context dependent models, interword context is also
handled. You can further use a tool mkbinhmm to convert the
ascii HMM definition file to binary format, for speeding up
the startup (this format is incompatible with that of HTK).
Language model
The grammar format is an original one, and tools to create a
recognirion grammar are included in the distribution. A
grammar consists of two files: one is a 'grammar' file that
describes sentence structures in a BNF style, using word
'categories' as terminate symbols. Another is a 'voca' file
that defines word with its pronunciations (i.e. phoneme
sequences) for each category. They should be converted by
mkdfa.pl(1) to a deterministic finite automaton file (.dfa)
and a dictionary file (.dict), respectively.
SPEECH INPUT
Same as Julius: Both live speech input and recorded speech file input
are supported. Live input stream from microphone device, DatLink
(NetAudio) device and tcpip network input using adintool is supported.
Speech waveform files (16bit WAV (no compression), RAW format, and many
other format will be acceptable if compiled with libsndfile library).
Feature parameter files in HTK format are also supported.
Note that Julian itself can only extract MFCC_E_D_N_Z features from
speech data. If you use an acoustic HMM trained by other feature type,
only the HTK parameter file of the same feature type can be used.
SEARCH ALGORITHM OVERVIEW
Recognition algorithm of Julian is based on a two-pass strategy. In
the first pass, a high-speed approximate search is performed using
weaker constraints then the given grammar. Here a LR beam search using
only inter-category constraints extracted from the grammar is per-
formed. The second pass re-searches the input, using the original gram-
mar rules and intermediate results from the first pass, to gain a high
precision result quickly. In the second pass the optimal solution is
theoretically guaranteed using the A* search.
When using context dependent phones (triphones), interword contexts are
taken into consideration. For tied-mixture and phonetic tied-mixture
models, high-speed acoustic likelihood calculation is possible using
gaussian pruning.
For more details, see the related document or web page below.
OPTIONS
The options below specify the models, system behaviors and various
search parameters. These option can be set all at once at the command
line, but it is recommended that you write them in a text file as a
"jconf file", and specify the file with "-C" option.
Most are the same as Julius.
Options only in Julian: -gram, -gramlist, -dfa, -penalty1, -penalty2,
-looktrellis
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2, -transp, -silhead,
-siltail, -spdur, -sepnum, -separatescore
Speech Input
-input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
Select speech data input source. 'rawfile' is waveform file,
and specified after startup from stdin). 'mic' means microphone
device, and 'adinnet' means receiving waveform data via tcpip
network from an adinnet client. 'netaudio' is from
DatLink/NetAudio input, and 'stdin' means data input from stan-
dard input.
WAV (no compression) and RAW (noheader, 16bit, BigEndian) are
supported for waveform file input. Other format can be sup-
ported using external library. To see what format is actually
supported, see the help message using option "-help". For stdin
input, only WAV and RAW is supported.
(default: mfcfile)
-filelist file
(With -input rawfile|mfcfile) perform recognition on all files
listed in the file.
-adport portnum
(with -input adinnet) adinnet port number (default: 5530)
-NA server:unit
(with -input netaudio) set the server name and unit ID of the
Datlink unit.
-zmean -nozmean
This option enables/disables DC offset removal of input wave-
form. For speech file input, zero mean will be computed from
the whole input. For microphone / network input, zero mean of
the first 48000 samples (3 seconds in 16kHz sampling) will be
used at the rest. (default: disabled (-nozmean))
-zmeanframe -nozmeanframe
With speech input, this option enables/disables frame-wise DC
offset removal. This is the same as HTK's ZMEANSOURCE option,
and cannot be set with "-zmean". (default: disabled (-nozmean-
frame))
-nostrip
Julian by default removes zero samples in input speech data. In
some cases, such invalid data may be recorded at the start or
end of recording. This option inhibit this automatic removal.
-record directory
Auto-save input speech data successively under the directory.
Each segmented inputs are recorded to a file each by one. The
file name of the recorded data is generated from system time
when the input starts, in a style of "YYYY.MMDD.HHMMSS.wav".
File format is 16bit monoral WAV. Invalid for mfcfile input.
With input rejection by "-rejectshort", the rejected input will
also be recorded even if they are rejected.
-rejectshort msec
Reject input shorter than specified milliseconds. Search will
be terminated and no result will be output. In module mode,
'<REJECTED REASON="..."/>' message will be sent to client. With
"-record", the rejected input will also be recorded even if they
are rejected. (default: 0 = off)
Speech Detection
Options in this section is invalid for mfcfile input.
-cutsilence
-nocutsilence
Force silence cutting (=speech segment detection) to ON/OFF.
(default: ON for mic/adinnet, OFF for files)
-lv threslevel
Level threshold (0 - 32767) for speech triggering. If audio
input amplitude goes over this threshold for a period, Julius
begin the 1st pass recognition. If the level goes below this
level after triggering, it is the end of the speech segment.
(default: 2000)
-zc zerocrossnum
Zero crossing threshold per a second (default: 60)
-headmargin msec
Margin at the start of the speech segment in milliseconds.
(default: 300)
-tailmargin msec
Margin at the end of the speech segment in milliseconds.
(default: 400)
Acoustic Analysis
-smpFreq frequency
Set sampling frequency of input speech in Hz. Sampling rate can
also be specified using "-smpPeriod". Be careful that this fre-
quency should be the same as the trained conditions of acoustic
model you use. This should be specified for microphone input
and RAW file input when using other than default rate. Also see
"-fsize", "-fshift", "-delwin" and "-accwin".
(default: 16000 (Hz = 625ns))
-smpPeriod period
Set sampling frequency of input speech by its sampling period
(nanoseconds). The sampling rate can also be specified using
"-smpFreq". Be careful that the input frequency should be the
same as the trained conditions of acoustic model you use. This
should be specified for microphone input and RAW file input when
using other than default rate. Also see "-fsize", "-fshift",
"-delwin" and "-accwin".
(default: 625 (ns = 16000Hz))
-fsize sample
Analysis window size in number of samples. (default: 400).
-fshift sample
Frame shift in number of samples (default: 160).
-preemph value
Pre-emphasis coefficient (default: 0.97)
-fbank num
Number of filterbank channels (default: 24)
-ceplif num
Cepstral liftering coefficient (default: 22)
-rawe / -norawe
Enable/disable using raw energy before pre-emphasis (default:
disabled)
-enormal / -nornormal
Enable/disable normalizing log energy (default: disabled).
Note: normalising log energy should not be specified on live
input, at both training and recognition (see sec. 5.9 "Direct
Audio Input/Output" in HTKBook).
-escale value
Scaling factor of log energy when normalizing log energy
(default: 1.0)
-silfloor value
Energy silence floor in dB when normalizing log energy (default:
50.0)
-delwin frame
Delta window size in number of frames (default: 2).
-accwin frame
Acceleration window size in number of frames (default: 2).
-lofreq frequency
Enable band-limiting for MFCC filterbank computation: set lower
frequency cut-off.
(default: -1 = disabled)
-hifreq frequency
Enable band-limiting for MFCC filterbank computation: set upper
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -