📄 julius.txt
字号:
JULIUS(1) JULIUS(1)
NAME
Julius - open source multi-purpose LVCSR engine
SYNOPSIS
julius [-C jconffile] [options ...]
DESCRIPTION
Julius is a high-performance, multi-purpose, free speech recognition
engine for researchers and developers. It is capable of performing
almost real-time recognition of continuous speech with over 60k-word
vocabulary on most current PCs.
Julius needs an N-gram language model, word dictionary and an acoustic
model to execute a recognition. Standard model formats (i.e. ARPA and
HTK) with any word/phone units and sizes are supported, so users can
build a recognition system for various target using their own language
model and acoustic models. For details about basic models and their
availability, please see the documents contained in this package.
Julius can perform recognition on audio files, live microphone input,
network input and feature parameter files. The maximum size of vocabu-
lary is 65,535 words.
RECOGNITION MODELS
Julius supports the following models.
Acoustic Models
Sub-word HMM (Hidden Markov Model) in HTK ascii format are
supported. Phoneme models (monophone), context dependent
phoneme models (triphone), tied-mixture and phonetic tied-
mixture models of any unit can be used. When using context
dependent models, interword context is also handled. You can
further use a tool mkbinhmm to convert the ascii HMM defini-
tion file to binary format, for speeding up the startup (this
format is incompatible with that of HTK).
Language model
2-gram and reverse 3-gram language models are used. The
Standard ARPA format is supported. In addition, a binary
format N-gram is also supported for efficiency. The tool
mkbingram. can convert binary N-gram from the ARPA language
models.
SPEECH INPUT
Both live speech input and recorded speech file input are supported.
Live input stream from microphone device, DatLink (NetAudio) device and
tcpip network input using adintool is supported. Speech waveform files
(16bit WAV (no compression), RAW format, and many other formats will be
acceptable if compiled with libsndfile library). Feature parameter
files in HTK format are also supported.
Note that Julius itself can only extract MFCC_E_D_N_Z features from
speech data. If you use an acoustic HMM trained by other feature type,
only the HTK parameter file of the same feature type can be used.
SEARCH ALGORITHM OVERVIEW
Recognition algorithm of Julius is based on a two-pass strategy. Word
2-gram and reverse word 3-gram is used on the respective passes. The
entire input is processed on the first pass, and again the final
searching process is performed again for the input, using the result of
the first pass to narrow the search space. Specifically, the recogni-
tion algorithm is based on a tree-trellis heuristic search combined
with left-to-right frame-synchronous beam search and right-to-left
stack decoding search.
When using context dependent phones (triphones), interword contexts are
taken into consideration. For tied-mixture and phonetic tied-mixture
models, high-speed acoustic likelihood calculation is possible using
gaussian pruning.
For more details, see the related document or web page below.
OPTIONS
The options below specify the models, system behaviors and various
search parameters. These option can be set all at once at the command
line, but it is recommended that you write them in a text file as a
"jconf file", and specify the file with "-C" option.
Speech Input
-input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
Select speech data input source. 'rawfile' is waveform file,
and specified after startup from stdin). 'mic' means microphone
device, and 'adinnet' means receiving waveform data via tcpip
network from an adinnet client. 'netaudio' is from
DatLink/NetAudio input, and 'stdin' means data input from stan-
dard input.
WAV (no compression) and RAW (noheader, 16bit, BigEndian) are
supported for waveform file input. Other format can be sup-
ported using external library. To see what format is actually
supported, see the help message using option "-help". For stdin
input, only WAV and RAW is supported.
(default: mfcfile)
-filelist file
(With -input rawfile|mfcfile) perform recognition on all files
listed in the file.
-adport portnum
(With -input adinnet) adinnet port number (default: 5530)
-NA server:unit
(With -input netaudio) set the server name and unit ID of the
Datlink unit.
-zmean -nozmean
This option enables/disables DC offset removal of input wave-
form. For speech file input, zero mean will be computed from
the whole input. For microphone / network input, zero mean of
the first 48000 samples (3 seconds in 16kHz sampling) will be
used at the rest. (default: disabled (-nozmean))
-zmeanframe -nozmeanframe
With speech input, this option enables/disables frame-wise DC
offset removal. This is the same as HTK's ZMEANSOURCE option,
and cannot be set with "-zmean". (default: disabled (-nozmean-
frame))
-nostrip
Julius by default removes zero samples in input speech data. In
some cases, such invalid data may be recorded at the start or
end of recording. This option inhibit this automatic removal.
-record directory
Auto-save input speech data successively under the directory.
Each segmented inputs are recorded to a file each by one. The
file name of the recorded data is generated from system time
when the input starts, in a style of "YYYY.MMDD.HHMMSS.wav".
File format is 16bit monoral WAV. Invalid for mfcfile input.
With input rejection by "-rejectshort", the rejected input will
also be recorded even if they are rejected.
-rejectshort msec
Reject input shorter than specified milliseconds. Search will
be terminated and no result will be output. In module mode,
'<REJECTED REASON="..."/>' message will be sent to client. With
"-record", the rejected input will also be recorded even if they
are rejected. (default: 0 = off)
Speech Detection
Options in this section is invalid for mfcfile input.
-cutsilence
-nocutsilence
Force silence cutting (=speech segment detection) to ON/OFF.
(default: ON for mic/adinnet, OFF for files)
-lv threslevel
Level threshold (0 - 32767) for speech triggering. If audio
input amplitude goes over this threshold for a period, Julius
begin the 1st pass recognition. If the level goes below this
level after triggering, it is the end of the speech segment.
(default: 2000)
-zc zerocrossnum
Zero crossing threshold per a second (default: 60)
-headmargin msec
Margin at the start of speech segment in milliseconds. (default:
300)
-tailmargin msec
Margin at the end of speech segment in milliseconds. (default:
400)
Acoustic Analysis
-smpFreq frequency
Set sampling frequency of input speech in Hz. Sampling rate can
also be specified using "-smpPeriod". Be careful that this fre-
quency should be the same as the trained conditions of acoustic
model you use. This should be specified for microphone input
and RAW file input when using other than default rate. Also see
"-fsize", "-fshift", "-delwin" and "-accwin".
(default: 16000 (Hz) = 625ns).
-smpPeriod period
Set sampling frequency of input speech by its sampling period
(nanoseconds). The sampling rate can also be specified using
"-smpFreq". Be careful that the input frequency should be the
same as the trained conditions of acoustic model you use. This
should be specified for microphone input and RAW file input when
using other than default rate. Also see "-fsize", "-fshift",
"-delwin" and "-accwin".
(default: 625 (ns) = 16000Hz).
-fsize sample
Analysis window size in number of samples. (default: 400).
-fshift sample
Frame shift in number of samples (default: 160).
-preemph value
Pre-emphasis coefficient (default: 0.97)
-fbank num
Number of filterbank channels (default: 24)
-ceplif num
Cepstral liftering coefficient (default: 22)
-rawe / -norawe
Enable/disable using raw energy before pre-emphasis (default:
disabled)
-enormal / -nornormal
Enable/disable normalizing log energy (default: disabled).
Note: normalising log energy should not be specified on live
input, at both training and recognition (see sec. 5.9 "Direct
Audio Input/Output" in HTKBook).
-escale value
Scaling factor of log energy when normalizing log energy
(default: 1.0)
-silfloor value
Energy silence floor in dB when normalizing log energy (default:
50.0)
-delwin frame
Delta window size in number of frames (default: 2).
-accwin frame
Acceleration window size in number of frames (default: 2).
-lofreq frequency
Enable band-limiting for MFCC filterbank computation: set lower
frequency cut-off. Also see "-hifreq".
(default: -1 = disabled)
-hifreq frequency
Enable band-limiting for MFCC filterbank computation: set upper
frequency cut-off. Also see "-lofreq".
(default: -1 = disabled)
-sscalc
Perform spectral subtraction using head part of each file. With
this option, Julius assume there are certain length of silence
at each input file. Valid only for rawfile input. Conflict
with "-ssload".
-sscalclen
With "-sscalc", specify the length of head part silence in mil-
liseconds (default: 300)
-ssload filename
Perform spectral subtraction for speech input using pre-esti-
mated noise spectrum from file. The noise spectrum data should
be computed beforehand by mkss. Valid for all speech input.
Conflict with "-sscalc".
-ssalpha value
Alpha coefficient of spectral subtraction for "-sscals" and
"-ssload". Noise will be subtracted stronger as this value gets
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -