opt_bin.html

来自「Data Structure Ebook」· HTML 代码 · 共 174 行
HTML
174 行
<HTML><HEAD>
<TITLE>Data Structures and Algorithms: Optimal Binary Search Tree</TITLE>

<META name="description" content="Data Structures and Algorithms Course Notes,
PLDS210 University of Western Australia">
<META name="keywords" content="data structures, algorithms,
binary search tree,
optimal binary search tree, dynamic algorithms, animation">
</HEAD>
<BODY BGCOLOR="#ffffff">
<TABLE BGCOLOR="#00c0f0" WIDTH="100%" CELLSPACING=0 CELLPADDING=0>
<TR BGCOLOR="#00f0f0"><TD ALIGN=right>
<FONT FACE=helvetica SIZE=+1><I>Data Structures and Algorithms</I></FONT>
</TD></TR>

<TR><TD><FONT FACE=helvetica SIZE=+2><B>9.3 Optimal Binary Search Trees</B></FONT>
</TD></TR>
</TABLE>
<P>

Up to this point, we have assumed that an <I>optimal search tree</I>
is one in which the probability of occurrence of all keys is
equal (or is unknown, in which case we assume it to be equal).
Thus we concentrated on <I>balancing</I> the tree so as to make
the cost of finding any key <I>at most</I> <B>log n</B>.
<P>
However, consider a dictionary of words used by a spelling checker
for English language documents.
It will be searched many more times
for 'a', 'the', 'and', <I>etc</I> than for the thousands of
uncommon words which are in the dictionary just in case someone
happens to use one of them. 
Such a dictionary needs to be large: the average educated person
has a vocabulary of 30 000 words, so it needs ~100 000 words in 
it to be effective. It is also reasonably easy to produce a
table of the frequency of occurrence of words: words are simply
counted in any suitable collection of documents considered to be
representative of those for which the spelling checker will be used.

A balanced binary tree is likely to end up with a word such as
'miasma' at its root, guaranteeing that in 99.99+% of searches,
at least one comparison is wasted!
<P>
If key, <B>k</B>, has relative frequency, <B>r<SUB>k</SUB></B>,
then in an <B>optimal tree</B>, 
<CENTER><B>sum(d<SUB>k</SUB>r<SUB>k</SUB>)</B>
</CENTER>
where <B>d<SUB>k</SUB></B> is the distance of the key, <B>k</B>,
from the root (<I>ie</I> the number of comparisons which must be
made before <B>k</B> is found), is minimised.
<P>
We make use of the property:
<BLOCKQUOTE>
<H4>Lemma</H4>
Sub-trees of optimal trees are themselves optimal trees.
<H4>Proof</H4>
If a sub-tree of a search tree is not an optimal tree, 
then a better search tree will be produced if the sub-tree
is replaced by an optimal tree.
</BLOCKQUOTE>
Thus the problem is to determine which key should be placed
at the root of the tree.
Then the process can be repeated for the left- and right-sub-trees.
However, a divide-and-conquer approach would choose each
key as a candidate root and repeat the process for each 
sub-tree.
Since there are <B>n</B> choices for the root and
2<B>O(n)</B> choices for roots of the two sub-trees,
this leads to an <B>O(n<SUP>n</SUP></B>)</B> algorithm.
<P>
An efficient algorithm can be generated by the <B>dynamic</B>
approach.
We calculate the <B>O(n)</B> best trees consisting of just
two elements (the neighbours in the sorted list of keys).
<TABLE>
<TR><TD><IMG SRC="optbin_a.gif" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/fig/optbin_a.gif"></TD>
<TD>In the figure, there are two possible
arrangements for the tree containing <B>F</B>
and <B>G</B>.<P>
The cost for (a) is
<CENTER> 5.1 + 7.2 = 19</CENTER>
and for (b)
<CENTER> 7.1 + 5.2 = 17</CENTER>
<P>
Thus (b) is the optimum tree and its
cost is saved as c(f,g).
We also store <B>g</B> as the root of the
best f-g sub-tree in best(f,g).
<P>
Similarly, we calculate the best cost for all <B>n-1</B>
sub-trees with two elements, c(g,h), c(h,i), <I>etc</I>.
</TD></TR>
</TABLE>
The sub-trees containing two elements are then used to
calculate the best costs for sub-trees of 3 elements.
This process is continued until we have calculated the
cost and the root for the optimal search tree with
<B>n</B> elements.
<P>There are <B>O(n<SUP>2</SUP>)</B> such sub-tree costs.
Each one requires <B>n</B> operations to determine,
if the cost of the smaller sub-trees is known.
<P>
Thus the overall algorithm is <B>O(n<SUP>3</SUP>)</B>.
<P>
<A HREF="optbin.c" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/source/optbin.c" TARGET=source>Code for optimal binary
search tree</A><BR>
Note some C 'tricks' to handle dynamically-allocated two-dimensional
arrays using pre-processor macros for C and BEST!<BR>
This <A HREF="OBSearch.java" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/Java/opt_bin/OBSearch.java" TARGET=source>Java code</A> 
may be easier to comprehend for some!
It uses this class for 
<A HREF="IntMatrix.java" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/Java/opt_bin/IntMatrix.java" TARGET=source1>integer matrices</A>.
<P>
The data structures used may be represented:
<BR>
<IMG SRC="optbin1.gif" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/fig/optbin1.gif">
<P>
After the initialisation steps, the data structures used
contain the frequencies, <B>rf<SUB>i</SUB></B>,
in <B>c<SUB>ii</SUB></B> (the costs of single element trees),
<B>max</B> everywhere below the diagonal and zeroes in
the positions just above the diagonal (to allow for the
trees which don't have a left or right branch):
<BR>
<IMG SRC="optbin2.gif" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/fig/optbin2.gif">
<P>
In the first iteration, all the positions below the
diagonal (<B>c<SUB>i,i+1</SUB></B>) will be filled in
with the optimal costs of two-element trees from
<B>i</B> to <B>i+1</B>.
<P>
In subsequent iterations, the optimal costs of <B>k-1</B>
element trees (<B>c<SUB>i,i+k</SUB></B>) are filled in
using previously calculated costs of smaller trees.

<H3>Animation</H3>

<A NAME=opt_bin_anim>
<TABLE BGCOLOR="#00f0f0" WIDTH="100%"> 
<TR><TD >
<FONT COLOR=blue FACE=helvetica>
<B>Optimal Binary Search Tree Animation</B><BR>
This animation was written by John Morris and (mostly) Woi Ang</FONT></TD>
<TD ALIGN=center>
  <TABLE BORDER=0>
  <TR><TD>
    <applet CODEBASE="tppjava" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/Java/opt_bin/" code = "AlgAnimApp.class" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/Java/opt_bin/AlgAnimApp.class" width = 200 height = 35>
    <param name = "filename" value = "AlgThread.java">
    <param name = "buttonname" value = "Run the Animation">
    <param name = "algname" value = "Optimal Binary Search Tree">
		<param name = "algfile" value = "graph.dij">
    </applet>
    </TD>
  </TR>
</TABLE>
</TD>
<TD><FONT FACE=helvetica>Please email comments to:<BR>
<A HREF=mailto:morris@ee.uwa.edu.au>morris@ee.uwa.edu.au</A>
</TR>
</TABLE>
<P>

<TABLE CELLPADDING=5 WIDTH="100%" BGCOLOR="#00f0f0" CELLSPACING=4>
<TR><TD WIDTH=50%>
Continue on to <A HREF="mat_chain.html" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/mat_chain.html">Matrix Chain Multiplication</A></TD>
<TD>Back to the <A HREF="ds_ToC.html" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/ds_ToC.html">Table of Contents</A>
</TD></TR></TABLE>
<SMALL>
&copy; <A HREF=mailto:morris@ee.uwa.edu.au>John Morris</A>, 1998
</SMALL>
</BODY>
</HTML>
opt_bin.html - 源码说明

本页面展示了「Data Structure Ebook」中的 opt_bin.html 源码文件，采用 HTML 编程语言编写，共 174 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Structure相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?