page220.html

来自「Data Structures And Algorithms With Obje」· HTML 代码 · 共 222 行 · 第 1/2 页

HTML
222
字号
<HTML><HEAD><TITLE>Character String Keys</TITLE></HEAD><BODY bgcolor="#FFFFFF"> <a href="../index.html" target="_top"><img src="../icons/usins.gif" alt="Logo" align=right></a><b>Data Structures and Algorithms with Object-Oriented Design Patterns in Python</b><br><A NAME="tex2html3738" HREF="page221.html"><IMG WIDTH=37 HEIGHT=24 ALIGN=BOTTOM ALT="next" SRC="../icons/next_motif.gif"></A> <A NAME="tex2html3736" HREF="page217.html"><IMG WIDTH=26 HEIGHT=24 ALIGN=BOTTOM ALT="up" SRC="../icons/up_motif.gif"></A> <A NAME="tex2html3730" HREF="page219.html"><IMG WIDTH=63 HEIGHT=24 ALIGN=BOTTOM ALT="previous" SRC="../icons/previous_motif.gif"></A>  <A NAME="tex2html3740" HREF="page611.html"><IMG WIDTH=43 HEIGHT=24 ALIGN=BOTTOM ALT="index" SRC="../icons/index_motif.gif"></A> <BR><HR><H2><A NAME="SECTION008330000000000000000">Character String Keys</A></H2><P>Strings of characters are represented in Python as instancesof the <tt>str</tt> class.A character string is simply a sequence of characters.Since such a sequence may be arbitrarily long,to devise a suitable hash functionwe must find a mapping from an unbounded domaininto the finite range of a 32-bit integer.<P>We can view a character string, <I>s</I>,as a sequence of <I>n</I> characters,<P> <IMG WIDTH=311 HEIGHT=16 ALIGN=BOTTOM ALT="displaymath62033" SRC="img892.gif"  ><P>where <I>n</I> is the length of the string.(The length of a string can be determined usingthe Python built-in method <tt>len</tt>).One very simple way to hash such a string would be to simplysum the numeric values associated with each character:<P><A NAME="eqnhashingstring1">&#160;</A> <IMG WIDTH=500 HEIGHT=46 ALIGN=BOTTOM ALT="equation10832" SRC="img893.gif"  ><P>As it turns out,this is not a particularly good way to hash character strings.Given that the integer value of a Python character is an 8-bit quantity, <IMG WIDTH=105 HEIGHT=29 ALIGN=MIDDLE ALT="tex2html_wrap_inline62041" SRC="img894.gif"  >, for all  <IMG WIDTH=65 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline62043" SRC="img895.gif"  >.As a result,  <IMG WIDTH=143 HEIGHT=28 ALIGN=MIDDLE ALT="tex2html_wrap_inline62045" SRC="img896.gif"  >.For example,given a string of length <I>n</I>=5,the value of <I>f</I>(<I>s</I>) falls between zero and  <IMG WIDTH=33 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline62051" SRC="img897.gif"  >.In fact, the situation is even worse,in North America we typically use onlythe <em>ASCII</em><A NAME=10840>&#160;</A> character set.The ASCII character set uses only the least-significant seven bitsof a <tt>char</tt>.If the string is comprised of only ASCII characters,the result falls in the range between zero and 640.<P>Essentially the problem with a function <I>f</I> which produces a resultin a relatively small intervalis the situation which arises when that function is composed withthe function  <IMG WIDTH=116 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline62055" SRC="img898.gif"  >.If the size of the range of the function <I>f</I> is less than <I>M</I>,then  <IMG WIDTH=62 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline61941" SRC="img872.gif"  > does not spread its values uniformlyon the interval [0,<I>M</I>-1].For example, if <I>M</I>=1031 only the first 640 values(62% of the range) are used!<P>Alternatively,suppose we have <em>a priori</em> knowledgethat character strings are limited to length <I>n</I>=4.Then, we can construct an integer by concatenatingthe binary representations of each of the characters.For example, given  <IMG WIDTH=123 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline62069" SRC="img899.gif"  >,we can construct an integer with the function<P><A NAME="eqnhashingstring2">&#160;</A> <IMG WIDTH=500 HEIGHT=19 ALIGN=BOTTOM ALT="equation10843" SRC="img900.gif"  ><P>where  <IMG WIDTH=49 HEIGHT=14 ALIGN=BOTTOM ALT="tex2html_wrap_inline62071" SRC="img901.gif"  >.Since <I>B</I> is a power of two,this function is easy to write in Python:<PRE>def f(s):    return ord(s[0]) &lt;&lt; 21 | ord(s[1]) \	&lt;&lt; 14 | ord(s[2]) &lt;&lt; 7 | ord(s[3])</PRE>While this function certainly has a larger range,it still has a problems--it cannot deal strings of arbitrary length.<P>Equation&nbsp;<A HREF="page220.html#eqnhashingstring2"><IMG  ALIGN=BOTTOM ALT="gif" SRC="../icons/cross_ref_motif.gif"></A> can be generalized to dealwith strings of arbitrary length as follows:<P> <IMG WIDTH=500 HEIGHT=46 ALIGN=BOTTOM ALT="equation10847" SRC="img902.gif"  ><P>This function produces a unique integer for every possible string.Unfortunately, the range of <I>f</I>(<I>s</I>) is unbounded.A simple modification of this algorithm suffices to bound the range:<P><A NAME="eqnhashingstring3">&#160;</A> <IMG WIDTH=500 HEIGHT=48 ALIGN=BOTTOM ALT="equation10852" SRC="img903.gif"  ><P>where  <IMG WIDTH=55 HEIGHT=12 ALIGN=BOTTOM ALT="tex2html_wrap_inline62077" SRC="img904.gif"  > such that <I>w</I> is word size of the machine.Unfortunately, since <I>W</I> and <I>B</I> are both powers of two,the value computed by this hash function depends only onthe last <I>W</I>/<I>B</I> characters in the character string.For example, for  <IMG WIDTH=60 HEIGHT=15 ALIGN=BOTTOM ALT="tex2html_wrap_inline61675" SRC="img845.gif"  > and  <IMG WIDTH=49 HEIGHT=14 ALIGN=BOTTOM ALT="tex2html_wrap_inline62071" SRC="img901.gif"  >,this result depends only on the last five characters in the string--all character strings having exactly the same last five characters collide!<P>Writing the code to compute Equation&nbsp;<A HREF="page220.html#eqnhashingstring3"><IMG  ALIGN=BOTTOM ALT="gif" SRC="../icons/cross_ref_motif.gif"></A> is actuallyquite straightforward if we realize that <I>f</I>(<I>s</I>)can be viewed as a polynomial in <I>B</I>,the coefficients of which are  <IMG WIDTH=13 HEIGHT=14 ALIGN=MIDDLE ALT="tex2html_wrap_inline62095" SRC="img905.gif"  >,  <IMG WIDTH=11 HEIGHT=14 ALIGN=MIDDLE ALT="tex2html_wrap_inline62097" SRC="img906.gif"  >, ...,  <IMG WIDTH=15 HEIGHT=14 ALIGN=MIDDLE ALT="tex2html_wrap_inline62099" SRC="img907.gif"  >.Therefore, we can use <em>Horner's rule</em><A NAME=10861>&#160;</A>(see Section&nbsp;<A HREF="page41.html#secmodelhorner"><IMG  ALIGN=BOTTOM ALT="gif" SRC="../icons/cross_ref_motif.gif"></A>) to compute <I>f</I>(<I>s</I>) as follows:<PRE>def f(s):    result = 0    for c in s:	result = result * B + ord(c)    return result</PRE>This implementation can be simplified even further if we makeuse of the fact that  <IMG WIDTH=47 HEIGHT=14 ALIGN=BOTTOM ALT="tex2html_wrap_inline62103" SRC="img908.gif"  >, where <I>b</I>=7.Since <I>B</I> is a power of two,in order to multiply the variable <tt>result</tt> by <I>B</I>all we need to do is to shift it left by <I>b</I> bits.Furthermore, having just shifted <tt>result</tt> left by <I>b</I> bits,we know that the least significant <I>b</I> bits of the result are zero.And since each character has no more than <I>b</I>=7 bits,we can replace the addition operationwith an <em>exclusive or</em><A NAME=10866>&#160;</A> operation.<PRE>def f(s):    result = 0    for c in s:	result = result &lt;&lt; b ^ ord(c)    return result</PRE>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?