📄 en991231.glm
字号:
;;;; File: en981118.glm;; Desc: This file contains the transcript filtering rules for the ARPA;; Hub4-E and Hub5-E Evaluations. This file originated from ;; et96_1.glm.;;;; Date: 970904;; - initial creation ;; Date: 971128;; - added Hub-4E evaluation contractions;; Date: 981118;; - added Hub-4E compound words and contractions;; Date: 991231;; - added Broadcast News compound words and contractions;;* name "en971128.glm"* desc "The Universal GLM file for the ARPA Hub4-E and Hub5-E Eval Test Alternate Spellings and Contractions Map"* format = 'NIST1'* max_nrules = '2000'* copy_no_hit = 'T'* case_sensitive = 'F';; File spellalt6.rls Version 6.0 dated 11/21/97;;;; Standard alternate spellings, usually per AHD;; meant to apply to SNOR transcriptions.;;'CAUSE => BECAUSE ;; per AHD'EM => THEM ;; per AHD'TIL => TILL ;; not exactly per AHD, but common;;;; Unambiguously handle mr. mrs. and ms.MR => MISTER / [ ] __ [ ] MR. => MISTER / [ ] __ [ ] MRS. => MRS / [ ] __ [ ] MS. => MS / [ ] __ [ ] ;; actually, "adviser' is preferred, but doing it this way;; avoids misspelling "advisory":ADVISER => ADVISOR / [ ] __ AFTERALL => AFTER ALL ;; per AHDALRIGHT => ALL RIGHT ;; "alright" is nonstandard per AHDBAGDAD => BAGHDAD ;; per AHDBALLGAME => BALL GAME ;; PER AHDBASELINE => BASE LINE ;; WWW hitsBAZAR => BAZAAR ;; per AHDBENEFITTED => BENEFITED ;; per AHDBENEFITTING => BENEFITING ;; per AHDBINYAMIN => BENJAMIN / __ [ NETANYAHU]BUILDUP => BUILD UP ;; WWW hitsCAMELBACK => CAMEL BACK ;; literally; AHD is silentCANCELLATION => CANCELATION ;; per AHDCANCELLED => CANCELED ;; per AHDCANCELLING => CANCELING ;; per AHDCATALOGUE => CATALOG ;; per AHDCEASEFIRE => CEASE FILE ;; per AHDCHANNELLING => CHANNELING ;; per AHDCHANNELLED => CHANNELED ;; per AHDCOFOUNDER => CO FOUNDER ;; not found in AHDCOMBATTED => COMBATED ;; per AHDCOMBATTING => COMBATING ;; per AHDCOLOSSEUM => COLISEUM ;; per AHDCOUNSELLOR => COUNSELOR ;; per AHDCOUNSELLED => COUNSELED ;; per AHDCOUNSELLING => COUNSELING ;; per AHDCOVERUP => COVER UP ;; per AHD, with hyphenCSAR => CZAR ;; WWW hitsDAYCARE => DAY CARE ;; per AHDDEPALMA => DE PALMA ;; Brian _DEUTSCHEMARK => DEUTSCHE MARK ;; per AHDDISFUNCTION => DYSFUNCTION ;; per AHDDONUT => DOUGHNUT ;; per AHDDOWN SYNDROME => DOWN'S SYNDROME ;; per AHDDUNKERQUE => DUNKIRK ;; per AHDENDGAME => END GAME ;; per AHDESTHETIC => AESTHETIC / [ ] __ ;; per AHDFALKNER => FAULKNER / [WILLIAM ] __ ;; the writer, per AHDFALLOUT => FALL OUT ;; WWW hitsFATIMAH => FATIMA ;; per AHDFIREFIGHTER => FIRE FIGHTER ;; per AHDFIREFIGHTERS => FIRE FIGHTERS ;; per AHDFO'C'SLE => FORECASTLE ;; per AHDFOCUSSED => FOCUSED ;; per AHDFOCUSSES => FOCUSES ;; per AHDFOCUSSING => FOCUSING ;; per AHDFOLLOWUP => FOLLOW UP ;; per AHD, with a hyphenFREELANCE => FREE LANCE ;; per AHDFRONTLINE => FRONT LINE ;; per AHD (adj is also front-line)FTSE => FOOTSIEFT SE => FOOTSIE ;; officially spelled w/hyphenFUNDRAIS => FUND RAIS ;; per AHD (actually "fund-raiser");; applies to fundraiser, fundraising, etc.FUNELLED => FUNELED ;; per AHDFUNELLING => FUNELING ;; per AHDGADDAFI => QADDHAFI ;; per AHDGELATINE => GELATIN ;; per AHDGENNADI => GENNADY ;; Russian transliteration not 100% fixedGOODBYE => GOOD BYE ;; per AHD (actually "good-bye")GOOD BY => GOOD BYE ;; per AHD (actually "good-by");;GOODNIGHT => GOOD NIGHT ;; check other dicts; AHD is silent on thisHARDLINE => HARD LINE ;; per AHD (adj or stem-r only)HEALTHCARE => HEALTH CARE ;; per AHDHEALTHDESK => HEALTH DESK ;; NPR spells theirs as 1 wordHIZBOLLAH => HEZBOLLAH ;; WWW hitsHOTLINE => HOT LINE ;; per AHDJETLINER => JET LINER ;; per AHDJOHNNY-COME-LATELYS => JOHNNY-COME-LATELIES ;; per AHDKADDAFI => QADDAFI ;; per AHDKEYWORD => KEY WORD ;; per AHDKHADAFY => QADDAFI ;; per AHDKIPPAH => KIPAH ;; WWW hitsKNOCKOFF => KNOCK OFF ;; WWW hitsLAGUARDIA => LA GUARDIA ;; WWW hits, altho AHD has only 2-word formLIFTGATE => LIFT GATE ;; eq. hits on WWWLOONEY => LOONY ;; per AHDMA'ARIV => MAARIV ;; WWW hitsMAKEUP => MAKE UP ;; actually hyphenated, per AHDMARKETPLACE => MARKET PLACE ;; per AHDMIKASUKI => MICCOSUKEE ;; per AHDMINDSET => MIND SET ;; actually hyphenated per AHDMISLABELLED => MISLABELED ;; per AHDMISLABELLING => MISLABELING ;; per AHDMONTHLONG => MONTH LONG ;; WWW hits, hyphenatedNEWSHOUR => NEWS HOUR ;; WWW hits, but ?? (Lehrer's is only 1 word)NEWSROOM => NEWS ROOM ;; WWW hitsNONECONOMIC => NON ECONOMIC ;; WWW hitsOK => OKAY / [ ] __ [ ] ;; per AHD (also O.K.);; but we can't allow o.k. because it may mean OklahomaONBOARD => ON BOARD ;; per AHD, with hyphenOSHKOSH => OSH KOSH ;; many net hits both waysOVERPOLL => OVER POLL ;; neologismPAPERWORK => PAPER WORK ;; per AHDPERCENT => PER CENT ;; per AHDPIROGHI => PIROGI ;; per AHDPLAYOFF => PLAY OFF ;; actually hyphenated per AHDPOSTCARD => POST CARD ;; per AHDPRIMETIME => PRIME TIME ;; per AHD, noun-2 words, adj-hyphenatedPROGRAMED => PROGRAMMED ;; per AHDPROGRAMER => PROGRAMMER ;; per AHDPROGRAMING => PROGRAMMING ;; per AHDQADHAFI => QADDAFI ;; WWW HITSREBURY => RE BURY ;; WWW hitsREELECT => RE ELECT ;; per AHD, with hyphenREENFORC => RE ENFORC ;; per AHD, with hyphenREENTER => RE ENTER ;; per AHD, with hyphenREEXAMIN => RE EXAMIN ;; per AHD, with hyphenREHOSPITAL => RE HOSPITAL ;; WWW hitsREIMAGINE => RE IMAGINE ;; WWW hitsRIVALING => RIVALLING ;; per AHDROADMAP => ROAD MAP ;; per AHDROUNDTABLE => ROUND TABLE ;; per AHD, w/ or w/o hyphenSABRETECH => SABRE TECH ;; both ways on WWWSANDIEGO => SAN DIEGO ;; per AHD, two wordsSHABUOTH => SHAVUOT ;; per AHDSHIAH => SHIA ;; per AHDSHOOTOUT => SHOOT OUT ;; per AHD, with hyphenSIZEABLE => SIZABLE ;; per AHDSNOWSHOWERS => SNOW SHOWERS ;; AHD is silent on this oneSPINOFF => SPIN OFF ;; per AHD, with hyphenSUPERBOWL => SUPER BOWL ;; 1 word 38k, 2 words 70k WWW hitsSUPERFIGHTER => SUPER FIGHTER ;; AHD silent; about eq WWW hitsT. => TEE / __ [ SHIRT] ;; per AHD, with hyphenTAIPEH => TAIPEI ;; per AHDTAKEOVER => TAKE OVER ;; per AHD, with hyphenTEENAGE => TEEN AGE ;; per AHD, with hyphenTHROUGHWAY => THRUWAY ;; in particular, the NY State ___TORNADOES => TORNADOS ;; PER AHDTSAR => CZAR ;; WWW hitsTZAR => CZAR ;; WWW hitsUNDERWAY => UNDER WAY ;; per AHDVINCENTS => VINCENT'S / [SAINT ] __ ;; possessives often drop from namesWALMART => WAL MART ;; WWW hits, butoffically 2 wrds w/hyphenWHIMSEY => WHIMSY ;; per AHDWHISTLEBLOWER => WHISTLE BLOWER ;; per AHDWHITEWATER => WHITE WATER ;; see it both ways -- no standardWILFUL => WILLFULL ;; per AHDWORKFORCE => WORK FORCE ;; per AHDWORKPLACE => WORK PLACE ;; per AHDWORKSHEET => WORK SHEET ;; per AHDWORLDVIEW => WORLD VIEW ;; per AHD, with hyphenYASSER => YASIR ;; per AHD, but alt common on webYASSIR => YASIR ;; per AHD, but alt common on webZAIREAN => ZAIRIAN ;; per AHDZUGANOV => ZYUGANOV ;; Russian transliteration not 100% fixed;; end of standard_alt_spellings.rls;;;; File uncertain_compound_words.rls Version 2 dated 12/13/96;;;; Rules for mapping alternate spellings of some;; compound words into just one form, for scoring.;; For these words, I'm not sure what the "correct";; form is.;;;; - W. M. Fisher, 12/10/96[AIRWING] => [AIR WING][BEANCOUNTER] => [BEAN COUNTER][BUTTHEAD] => [BUTT HEAD][CANNOT ] => [CAN NOT ][HOMEPAGE] => [HOME PAGE][PHOTOCALL] => [PHOTO CALL][DUMPTRUCK] => [DUMP TRUCK][GUNBATTLE] => [GUN BATTLE][GOODMORNING AMERICA ] => [GOOD MORNING AMERICA ] ;; see it both ways on Web;;[GUNNYSACK ] => [GUNNY SACK ] ;; 1 word per AHD, but I've often seen it hyphenated[TRAINSPOTTING ] => [TRAIN SPOTTING ][WEBSITE] => [WEB SITE];; end of uncertain_compound_words.rls;; File training_errs.rls Version 3 dated 12/12/96;;;; Rules to apply to REF and HYP LSN transcriptions;; in scoring to compensate for errors in the;; training data for the 1996 CSR Hub4 evaluations.;; Generally speaking, if a certain word occurs;; in the official training data (which includes;; the American Heritage Dictionary, AHD) sometimes;; spelled X and sometimes Y, the difference between;; X and Y will not be counted an error in the;; evaluation test. This forgiveness will not;; be done, however, in cases of typos;; that result in a valid word spelling.;;;; - Jon Fiscus & Bill Fisher, 12/10/96;;;; Note - the left hand and right hand sides of;; many-to-one rules have been interchanged;; in order to give a finer grain to scoring, even;; when the single token is in fact the correct version.;; Because of this, these rules are *not* appropriate;; to be used to correct transcriptions in general!
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -