📄 adin-cut.c
字号:
/** * @file adin-cut.c * @author Akinobu LEE * @date Sat Feb 12 13:20:53 2005 * * <JA> * @brief 不兰艰り哈みおよび不兰惰粗浮叫 * * 不兰掐蜗デバイスからの不兰デ〖タの艰り哈み·および不兰惰粗の浮叫を * 乖ないますˉ * * 不兰惰粗の浮叫は·慷升レベルと雾蛤汗眶を脱いて乖なっていますˉ * 掐蜗们室ごとに·レベルしきい猛を臂える慷升について雾蛤汗眶をカウントし· * それが回年した眶笆惧になれば·不兰惰粗倡幌浮叫として * 艰り哈みを倡幌しますˉ艰り哈み面に雾蛤汗眶が回年眶笆布になれば· * 艰り哈みを匿贿しますˉ悸狠には磋夫に磊り叫しを乖なうため·倡幌婶と * 匿贿婶の涟稿にマ〖ジンを积たせて磊り叫しますˉ * また涩妥であれば DC offset の拇腊を乖ないますˉ * * 不兰デ〖タの艰り哈みと事乖して掐蜗不兰の借妄を乖ないますˉこのため· * 艰り哈んだ不兰デ〖タはその艰り哈み帽疤∈live掐蜗では办年箕粗·不兰ファイル * ではバッファサイズ∷ごとに·それらを苞眶としてコ〖ルバック簇眶が钙ばれますˉ * このコ〖ルバック簇眶としてデ〖タの瘦赂や泼魔翁藐叫· * ∈フレ〖ム票袋の∷千急借妄を渴める簇眶を回年しておきますˉ * * マイク掐蜗や NetAudio 掐蜗などの Live 掐蜗を木儡粕み哈む眷圭· * コ〖ルバック柒の借妄が脚く借妄が掐蜗の庐刨に纳い烧かないと· * デバイスのバッファが邦れ·掐蜗们室がロストする眷圭がありますˉ * このエラ〖を松ぐために·もし悸乖茨董で pthread が蝗脱材墙であれば· * 不兰艰り哈みˇ不兰惰粗浮叫婶は塑挛と迫惟したスレッドとして瓢侯しますˉ * この眷圭·このスレッドは塑スレッドとバッファ @a speech を拆して笆布のように * 定拇瓢侯しますˉ * * - Thread 1: 不兰艰り哈みˇ不兰惰粗浮叫スレッド * - デバイスから不兰デ〖タを粕み哈みながら不兰惰粗浮叫を乖なうˉ * 浮叫した不兰惰粗のサンプルはバッファ @a speech の琐萨に绵肌 * 纳裁されるˉ * - このスレッドは弹瓢箕から塑スレッドから迫惟して瓢侯し· * 惧淡の瓢侯を乖ない鲁けるˉ * - Thread 2: 不兰借妄ˇ千急借妄を乖なう塑スレッド * - バッファ @a speech を办年箕粗ごとに雌浑し·糠たなサンプルが * Thread 1 によって纳裁されたらそれらを借妄し·借妄が姜位した * 尸バッファを低めるˉ * * 年盗される簇眶の车妥は笆布のとおりですˉ * Juliusのメイン婶から钙び叫される簇眶は adin_go() ですˉ * 不兰艰り哈みと惰粗浮叫借妄の塑挛は adin_cut() ですˉ * 不兰掐蜗ソ〖スの磊仑えは· adin_setup_func() を滦据となる掐蜗ストリ〖ムの * 倡幌ˇ粕み哈みˇ匿贿の簇眶を苞眶として钙び叫すことで乖なわれますˉ * また磊り叫し借妄のための称硷パラメ〖タは adin_setup_param() でセットしますˉ * </JA> * <EN> * @brief Read in speech waveform and detect speech segment * * This file contains functions to get speech waveform from an audio device * and detect speech segment. * * Speech detection is based on level threshold and zero cross count. * The number of zero cross are counted for each incoming speech fragment. * If the number becomes larger than specified threshold, the fragment * is treated as a beginning of speech input (trigger on). If the number goes * below the threshold, the fragment will be treated as an * end of speech input (trigger off). In actual * detection, margins are considered on the beginning and ending point, which * will be treated as head and tail silence part. DC offset normalization * will be also performed if configured so. * * The triggered input speech data should be processed concurrently with the * detection for real-time recognition. For this purpose, after the * beginning of speech input has been detected, the following triggered input * fragments (samples of a certain period in live input, or buffer size in * file input) are passed sequencially in turn to a callback function. * The callback function should be specified by the caller, typicaly to * store the recoded speech, or to process them into a frame-synchronous * recognition process. * * When source is a live input such as microphone, the device buffer will * overflow if the processing callback is slow. In that case, some input * fragments may be lost. To prevent this, the A/D-in part together with * speech detection will become an independent thread if @em pthread functions * are supported. The A/D-in and detection thread will cooperate with * the original main thread through @a speech buffer, like the followings: * * - Thread 1: A/D-in and speech detection thread * - reads audio input from source device and perform speech detection. * The detected fragments are immediately appended * to the @a speech buffer. * - will be detached after created, and run forever till the main * thread dies. * - Thread 2: Main thread * - performs speech processing and recognition. * - watches @a speech buffer, and if detect appendings of new samples * by the Thread 1, proceed the processing for the appended samples * and purge the finished samples from @a speech buffer. * * adin_setup_func() is used to switch audio input by specifying device-dependent * open/read/close functions, and should be called at first. * Function adin_setup_param() should be called after adin_setup_func() to * set various parameters for speech detection. * The adin_go() function is the top function that will be called from * outside, to perform actual input processing. adin_cut() is * the main function to read audio input and detect speech segment. * </EN> * * @sa adin.c * * $Revision: 1.6 $ * *//* * Copyright (c) 1991-2006 Kawahara Lab., Kyoto University * Copyright (c) 2000-2005 Shikano Lab., Nara Institute of Science and Technology * Copyright (c) 2005-2006 Julius project team, Nagoya Institute of Technology * All rights reserved */#include <sent/stddefs.h>#include <sent/speech.h>#include <sent/adin.h>#ifdef HAVE_PTHREAD#include <pthread.h>#endif/// Define this if you want to output a debug message for threading#undef THREAD_DEBUG/// Enable some fixes relating adinnet+module#define TMP_FIX_200602 /** * @name Variables of zero-cross parameters and buffer sizes * *///@{static int c_length = 5000; ///< Computed length of cycle buffer for zero-cross, actually equals to head margin lengthstatic int c_offset = 0; ///< Static data DC offset (obsolute, should be 0)static int wstep = DEFAULT_WSTEP; ///< Data fragment sizestatic int thres; ///< Input Level threshold (0-32767)static int noise_zerocross; ///< Computed threshold of zerocross num in the cycle bufferstatic int nc_max; ///< Computed number of fragments for tail margin//@}/** * @name Variables for delayed tail silence processing * *///@{static SP16 *swapbuf; ///< Buffer for re-triggering in tail marginstatic int sbsize, sblen; ///< Size and current length of @a swapbufstatic int rest_tail; ///< Samples not processed yet in swap buffer//@}/** * @name Work area for device configurations for local use * *///@{static boolean (*ad_resume)(); ///< Function pointer to (re)start inputstatic boolean (*ad_pause)(); ///< Function pointer to stop inputstatic int (*ad_read)(SP16 *, int); ///< Function pointer to read in input samplesstatic boolean adin_cut_on; ///< TRUE if do input segmentation by silencestatic boolean silence_cut_default; ///< Device-dependent default value of adin_cut_on()static boolean strip_flag; ///< TRUE if skip invalid zero samplesstatic boolean enable_thread = FALSE; ///< TRUE if input device needs threadingstatic boolean ignore_speech_while_recog = TRUE; ///< TRUE if ignore speech input between call, while waiting recognition processstatic boolean need_zmean; ///< TRUE if perform zmeansource//@}#ifdef HAVE_PTHREADstatic void adin_thread_create(); ///< create and start A/D-in and detection thread #endif/** * Store the given device-dependent functions and configuration values * to local work area. This function will be called from adin_select() * via adin_register_func(). * */voidadin_setup_func(int (*cad_read)(SP16 *, int), ///< [in] function to read input samples boolean (*cad_pause)(), ///< [in] function to stop input boolean (*cad_resume)(), ///< [in] function to (re-)start input boolean use_cut_def, ///< [in] TRUE if the device needs speech segment detection by default boolean need_thread ///< [in] TRUE if the device is live input and needs threading ){ ad_read = cad_read; ad_pause = cad_pause; ad_resume = cad_resume; silence_cut_default = use_cut_def;#ifdef HAVE_PTHREAD enable_thread = need_thread;#else if (need_thread == TRUE) { j_printerr("Warning: thread not supported, input may be corrupted on slow machines\n"); }#endif}/** * Setup silence detection parameters (should be called after adin_select()). * If using pthread, the A/D-in and detection thread will be started at the end * of this function. * * @param silence_cut [in] whether to perform silence cutting. * 0=force off, 1=force on, 2=keep device-specific default * @param strip_zero [in] TRUE if enables stripping of zero samples * @param cthres [in] input level threshold (0-32767) * @param czc [in] zero-cross count threshold in a second * @param head_margin [in] header margin length in msec * @param tail_margin [in] tail margin length in msec * @param sample_freq [in] sampling frequency: just providing value for computing other variables * @param ignore_speech [in] TRUE if ignore speech input between call, while waiting recognition process * @param need_zeromean [in] TRUE if perform zero-mean subtraction */voidadin_setup_param(int silence_cut, boolean strip_zero, int cthres, int czc, int head_margin, int tail_margin, int sample_freq, boolean ignore_speech, boolean need_zeromean){ float samples_in_msec; if (silence_cut < 2) { adin_cut_on = (silence_cut == 1) ? TRUE : FALSE; } else { adin_cut_on = silence_cut_default; } strip_flag = strip_zero; thres = cthres; ignore_speech_while_recog = ignore_speech; need_zmean = need_zeromean; /* calc & set internal parameter from configuration */ samples_in_msec = (float) sample_freq / (float)1000.0; /* cycle buffer length = head margin length */ c_length = (int)((float)head_margin * samples_in_msec); /* in msec. */ /* compute zerocross trigger count threshold in the cycle buffer */ noise_zerocross = czc * c_length / sample_freq; /* process step */ wstep = DEFAULT_WSTEP; /* variables that comes from the tail margin length (in wstep) */ nc_max = (int)((float)(tail_margin * samples_in_msec / (float)wstep)) + 2; sbsize = tail_margin * samples_in_msec + (c_length * czc / 200);#ifdef HAVE_PTHREAD if (enable_thread) { /* create A/D-in thread here */ adin_thread_create(); }#endif}/** * Query function to check whether the input speech detection is on or off. * * @return TRUE if on, FALSE if off. */booleanquery_segment_on(){ return adin_cut_on;}/** * Query function to check whether the input threading is on or off. * * @return TRUE if on, FALSE if off. */booleanquery_thread_on(){ return enable_thread;}/** * Reset zero mean data to re-estimate zero mean at the next input. * */voidadin_reset_zmean(){ if (need_zmean) zmean_reset();}#ifdef HAVE_PTHREAD/** * @name Variables related to POSIX threading * *///@{static pthread_t adin_thread; ///< Thread informationstatic pthread_mutex_t mutex; ///< Lock primitivestatic SP16 *speech; ///< Unprocessed samples recorded by A/D-in threadstatic int speechlen; ///< Current length of @a speech/** * @brief Semaphore to start/stop recognition. * * If TRUE, A/D-in thread will store incoming samples to @a speech and * main thread will detect and process them. * If FALSE, A/D-in thread will still get input and check trigger as the same * as TRUE case, but does not store them to @a speech. * */static boolean transfer_online = FALSE; static boolean adinthread_buffer_overflowed = FALSE; ///< Will be set to TRUE if @a speech has been overflowed.//@}#endif/** * @name Input data buffer * *///@{static SP16 *buffer = NULL; ///< Temporary buffer to hold input samplesstatic int bpmax; ///< Maximum length of @a bufferstatic int bp; ///< Current point to store the next datastatic int current_len; ///< Current length of stored samplesstatic SP16 *cbuf; ///< Buffer for flushing cycle buffer just after detecting trigger //@}/** * Purge samples already processed in the temporary buffer @a buffer. * * @param from [in] Purge samples in range [0..from-1]. */static voidadin_purge(int from){ if (from > 0 && current_len-from > 0) { memmove(buffer, &(buffer[from]), (current_len - from) * sizeof(SP16)); } bp = current_len - from;}/** * @brief Main A/D-in function * * In threaded mode, this function will detach and loop forever in ad-in * thread, storing triggered samples in @a speech, and telling the status * to another process thread via @a transfer_online. * The process thread, called from adin_go(), polls the length of * @a speech and @a transfer_online, and if there are stored samples, * process them. * * In non-threaded mode, this function will be called directly from * adin_go(), and triggered samples are immediately processed within here. * * In module mode, the function argument @a ad_check should be specified * to poll the status of incoming command from client while recognition. * * @return -1 on error, 0 on end of stream, >0 when paused by external process. */static intadin_cut( int (*ad_process)(SP16 *, int), ///< function to process the triggered samples int (*ad_check)()) ///< function periodically called while input processing{ static int i; static boolean is_valid_data; ///< TRUE if we are now triggered int ad_process_ret; int imax, len, cnt; static boolean end_of_stream; /* will be set to TRUE if current input stream has reached the end (in case of file input or adinnet input). If TRUE, no more input will be got by ad_read, but just process the already stored samples until it becomes empty */ static int need_init = TRUE; /* if TRUE, initialize buffer on startup */ static int end_status; /* return value */ static boolean transfer_online_local; /* local repository of transfer_online */ static int zc; /* count of zero cross */ static int nc; /* count of current tail silence segments */ /* * there are 3 buffers: * temporary storage queue: buffer[] * cycle buffer for zero-cross counting: (in zc_e) * swap buffer for re-starting after short tail silence * * Each samples are first read to buffer[], then passed to count_zc_e() * to find trigger. Samples between trigger and end of speech are * passed to (*ad_process) with pointer to the first sample and its length. * */ /**********************/
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -