📄 text::soundex.3
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings. \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote. \*(C+ will.\" give a nicer C++. Capital omega is used to do unbreakable dashes and.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\. ds -- \(*W-. ds PI pi. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch. ds L" "". ds R" "". ds C` "". ds C' ""'br\}.el\{\. ds -- \|\(em\|. ds PI \(*p. ds L" ``. ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD. Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\. de IX. tm Index:\\$1\t\\n%\t"\\$2"... nr % 0. rr F.\}.el \{\. de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear. Run. Save yourself. No user-serviceable parts.. \" fudge factors for nroff and troff.if n \{\. ds #H 0. ds #V .8m. ds #F .3m. ds #[ \f1. ds #] \fP.\}.if t \{\. ds #H ((1u-(\\\\n(.fu%2u))*.13m). ds #V .6m. ds #F 0. ds #[ \&. ds #] \&.\}. \" simple accents for nroff and troff.if n \{\. ds ' \&. ds ` \&. ds ^ \&. ds , \&. ds ~ ~. ds /.\}.if t \{\. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u". ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}. \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E. \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'. \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\. ds : e. ds 8 ss. ds o a. ds d- d\h'-1'\(ga. ds D- D\h'-1'\(hy. ds th \o'bp'. ds Th \o'LP'. ds ae ae. ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "Text::Soundex 3".TH Text::Soundex 3 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification. Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"Text::Soundex \- Implementation of the soundex algorithm..SH "SYNOPSIS".IX Header "SYNOPSIS".Vb 1\& use Text::Soundex;\&\& # Original algorithm.\& $code = soundex($name); # Get the soundex code for a name.\& @codes = soundex(@names); # Get the list of codes for a list of names.\&\& # American Soundex variant (NARA) \- Used for US census data.\& $code = soundex_nara($name); # Get the soundex code for a name.\& @codes = soundex_nara(@names); # Get the list of codes for a list of names.\&\& # Redefine the value that soundex() will return if the input string\& # contains no identifiable sounds within it.\& $Text::Soundex::nocode = \*(AqZ000\*(Aq;.Ve.SH "DESCRIPTION".IX Header "DESCRIPTION"Soundex is a phonetic algorithm for indexing names by sound, aspronounced in English. The goal is for names with the samepronunciation to be encoded to the same representation so that theycan be matched despite minor differences in spelling. Soundex is themost widely known of all phonetic algorithms and is often used(incorrectly) as a synonym for \*(L"phonetic algorithm\*(R". Improvements toSoundex are the basis for many modern phonetic algorithms. (Wikipedia,2007).PPThis module implements the original soundex algorithm developed byRobert Russell and Margaret Odell, patented in 1918 and 1922, as wellas a variation called \*(L"American Soundex\*(R" used for \s-1US\s0 census data, andcurrent maintained by the National Archives and Records Administration(\s-1NARA\s0)..PPThe soundex algorithm may be recognized from Donald Knuth's\&\fBThe Art of Computer Programming\fR. The algorithm described byKnuth is the \s-1NARA\s0 algorithm..PPThe value returned for strings which have no soundex encoding isdefined using \f(CW$Text::Soundex::nocode\fR. The default value is \f(CW\*(C`undef\*(C'\fR,however values such as \f(CW\*(AqZ000\*(Aq\fR are commonly used alternatives..PPFor backward compatibility with older versions of this module the\&\f(CW$Text::Soundex::nocode\fR is exported into the caller's namespace as\&\f(CW$soundex_nocode\fR..PPIn scalar context, \f(CW\*(C`soundex()\*(C'\fR returns the soundex code of its firstargument. In list context, a list is returned in which each element is thesoundex code for the corresponding argument passed to \f(CW\*(C`soundex()\*(C'\fR. Forexample, the following code assigns \f(CW@codes\fR the value \f(CW\*(C`(\*(AqM200\*(Aq, \*(AqS320\*(Aq)\*(C'\fR:.PP.Vb 1\& @codes = soundex qw(Mike Stok);.Ve.PPTo use \f(CW\*(C`Text::Soundex\*(C'\fR to generate codes that can be used to search oneof the publically available \s-1US\s0 Censuses, a variant of the soundexalgorithm must be used:.PP.Vb 2\& use Text::Soundex;\& $code = soundex_nara($name);.Ve.PPAn example of where these algorithm differ follows:.PP.Vb 3\& use Text::Soundex;\& print soundex("Ashcraft"), "\en"; # prints: A226\& print soundex_nara("Ashcraft"), "\en"; # prints: A261.Ve.SH "EXAMPLES".IX Header "EXAMPLES"Donald Knuth's examples of names and the soundex codes they map toare listed below:.PP.Vb 6\& Euler, Ellery \-> E460\& Gauss, Ghosh \-> G200\& Hilbert, Heilbronn \-> H416\& Knuth, Kant \-> K530\& Lloyd, Ladd \-> L300\& Lukasiewicz, Lissajous \-> L222.Ve.PPso:.PP.Vb 2\& $code = soundex \*(AqKnuth\*(Aq; # $code contains \*(AqK530\*(Aq\& @list = soundex qw(Lloyd Gauss); # @list contains \*(AqL300\*(Aq, \*(AqG200\*(Aq.Ve.SH "LIMITATIONS".IX Header "LIMITATIONS"As the soundex algorithm was originally used a \fBlong\fR time ago in the \s-1US\s0it considers only the English alphabet and pronunciation. In particular,non-ASCII characters will be ignored. The recommended method of dealingwith characters that have accents, or other unicode characters, is to usethe Text::Unidecode module available from \s-1CPAN\s0. Either use the moduleexplicitly:.PP.Vb 2\& use Text::Soundex;\& use Text::Unidecode;\&\& print soundex(unidecode("Fran\exE7ais")), "\en"; # Prints "F652\en".Ve.PPOr use the convenient wrapper routine:.PP.Vb 1\& use Text::Soundex \*(Aqsoundex_unicode\*(Aq;\&\& print soundex_unicode("Fran\exE7ais"), "\en"; # Prints "F652\en".Ve.PPSince the soundex algorithm maps a large space (strings of arbitrarylength) onto a small space (single letter plus 3 digits) no inferencecan be made about the similarity of two strings which end up with thesame soundex code. For example, both \f(CW\*(C`Hilbert\*(C'\fR and \f(CW\*(C`Heilbronn\*(C'\fR endup with a soundex code of \f(CW\*(C`H416\*(C'\fR..SH "MAINTAINER".IX Header "MAINTAINER"This module is currently maintain by Mark Mielke (\f(CW\*(C`mark@mielke.cc\*(C'\fR)..SH "HISTORY".IX Header "HISTORY"Version 3 is a significant update to provide support for versions ofPerl later than Perl 5.004. Specifically, the \s-1XS\s0 version of the\&\fIsoundex()\fR subroutine understands strings that are encoded using \s-1UTF\-8\s0(unicode strings)..PPVersion 2 of this module was a re-write by Mark Mielke (\f(CW\*(C`mark@mielke.cc\*(C'\fR)to improve the speed of the subroutines. The \s-1XS\s0 version of the \fIsoundex()\fRsubroutine was introduced in 2.00..PPVersion 1 of this module was written by Mike Stok (\f(CW\*(C`mike@stok.co.uk\*(C'\fR)and was included into the Perl core library set..PPDave Carlsen (\f(CW\*(C`dcarlsen@csranet.com\*(C'\fR) made the request for the \s-1NARA\s0algorithm to be included. The \s-1NARA\s0 soundex page can be viewed at:\&\f(CW\*(C`http://www.nara.gov/genealogy/soundex/soundex.html\*(C'\fR.PPIan Phillips (\f(CW\*(C`ian@pipex.net\*(C'\fR) and Rich Pinder (\f(CW\*(C`rpinder@hsc.usc.edu\*(C'\fR)supplied ideas and spotted mistakes for v1.x.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -