⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 decluster.w,v

📁 Lin-Kernighan heuristic for the TSP and minimum weight perfect matching
💻 W,V
📖 第 1 页 / 共 5 页
字号:
described in the opening sections.% Relate this to dynamic MST or whatever that SODA'97 paper was about.% They use similar the term ``topology tree'' as well, but in a different% sense.When $T$ has $n$ vertices, $T'$ has $2n-1$ vertices.  Each is an entryin an array of |decluster_edge_t| nodes.  The |city| array in each nodeis aliased to the name |child|, and stores child indices.  ($T'$ is binary.) A value of$-1$ indicates no child.The first $n$ entries of $T'$ correspond to the vertices of the originalgraph.  They have zero cost and no children.   % We may want to do an `at s T_prime TeX'The \LCA\ queries don't actually need |T_prime|, so we can throwit away if need be.  We defend against a memory leak in case we reallocate for a different size.@@^memory leak@@>@@d child city@@d NO_CHILD (-1)@@<Allocate and pre-initialize |T_prime|@@>=if ( T_prime==NULL || T_prime->n!=n+n-1 ) {	int i;	@@<Clean up |T_prime|@@> /* Defend against a memory leak. */	T_prime =  new_of(decluster_tree_t);	T_prime->n = n+n-1;	T_prime->edge = new_arr_of(decluster_edge_t,n+n-1);	for(i=0;i<n;i++) {		T_prime->edge[i].child[0]=NO_CHILD;		T_prime->edge[i].child[1]=NO_CHILD;		T_prime->edge[i].cost=0;	}}@@@@<Other setup code@@>=@@<Allocate and pre-initialize |T_prime|@@>@@;@@@@<Clean up |T_prime|@@>=if ( T_prime ) {	const int n = T_prime->n;	free_mem(T_prime->edge);mem_deduct(n*sizeof(decluster_edge_t));	free_mem(T_prime);mem_deduct(sizeof(decluster_tree_t));	T_prime=NULL;}@@@@<Other cleanup code@@>=@@<Clean up |T_prime|@@>@@;@@ We must define |T_prime|.@@<Module variables@@>=static decluster_tree_t *T_prime=NULL;@@ The first $n$ entries of |T_prime| are filled out at memory allocationtime.  We then copy $T$ into the last $n-1$ positions of |T_prime|in increasing order while making the appropriate links.Tree |T_prime| is digested into an array of four integers: |level|,|inlabel|, |ascendant|, and head.In the normal case we throw away |T_prime| after preprocessing becauseit isn't needed for \LCA\ queries.In this case we may have to reallocate |T_prime| here.@@<Subroutines@@>=voiddecluster_preprocess(decluster_tree_t *T) {	errorif(T->n!=n-1,"decluster_preprocess: MST size %d should be %d",T->n, n-1);	if ( T_prime==NULL ) @@<Allocate and pre-initialize |T_prime|@@>@@;	@@<Fill the last $n-1$ entries of |T_prime| with data from $T$@@>@@;	print_tree(T_prime,"T_prime");	@@<Copy |cost| fields from |T_prime|@@>@@;	@@<Compute |level| numbers@@>@@;	@@<Compute |inlabel| numbers@@>@@;	@@<Compute |ascendant| numbers@@>@@;	@@<Compute |parent_of_head| numbers@@>@@;	@@<Verbose: print the preprocessing data@@>@@;	if ( decluster_discard_topology_tree ) {		@@<Clean up |T_prime|@@>@@;	}}@@ First off, in most cases we want to save space by throwing awaytopology tree |T_prime| once we're through processing it.But sometimes we want to keep it around.  We make it optional through theglobally visible variable |decluster_discard_topology_tree|.@@<Global variables@@>=int decluster_discard_topology_tree=1;@@@@<Exported variables@@>=extern int decluster_discard_topology_tree;@@ We add each edge of $T$ in ascending order.  Essentiallywe simulate Kruskal's algorithm, but with the knowledge that everyedge ends up in the output.   Tree |T_prime| in some sense recordsthe history of the Kruskal algorithm computation.Array |component| storesthe evolving component structure of the topology tree.    Entry |component[i]| points to the root of the component containingvertex |i|.  Entries may go out of date by the addition of new edges, but they can always be updated by following links up the tree.A value of |NO_PARENT| indicates that a vertex is a root.@@d NO_PARENT (-2)@@<Fill the last $n-1$ entries of |T_prime| with data from $T$@@>={ int r, w,i, *component=new_arr_of(int,n+n); 	/* |r| is a read cursor, |w| is a write cursor */for (i=0;i<n;i++) component[i]=NO_PARENT;sort(T->edge,(size_t)(n-1),sizeof(decluster_edge_t),decluster_edge_cmp);print_tree(T,"T");for (r=0,w=n;r<n-1;r++,w++) {	T_prime->edge[w] = T->edge[r];	@@<Link |T_prime->edge[w]| with its two children and update |component|@@>@@;}free_mem(component);mem_deduct((n+n)*sizeof(int));}@@ We need to import the sorting routine from module \module{lk}.  Theuser has the option of changing it.@@<Module headers@@>=#include "lk.h"@@  Does an amortized argument prove that the following runs in linear timeover the entire sequence?It is similar to the classic path compression solution to the merge/find problem that is a standard component in implementations ofKruskal's algorithm.  That has worst-case time that is almostlinear, $O(n \alpha(n))$.  But I'm confused as to whether it does less ormore work.  I'll have to think about it more. (Some thoughts: Cf. combining tree in asyncrhonous PRAM (Nishimura);linear time heapify.)We save some buffer space and some time because we already know wherethe nodes will all be pointing to.  They'll point to the vertex labeledwith the edge we're adding, namely |T_prime->edge[w]|.@@<Link |T_prime->edge[w]| with its two children and update |component|@@>={ int i, here, parent;component[w]=NO_PARENT;for ( i=0; i<2 ; i++ ) {	here=T_prime->edge[w].city[i];	while ((parent=component[here])!=NO_PARENT) {		component[here]=w;		here=parent;	}	component[here]=w;	T_prime->edge[w].child[i]=here; /* |child[i]| is aliased to |city[i]| */}}@@*Bit twiddling. The least common ancestor code below requires some fast bit manipulation.In several places we need to copy the top 1 bit of a worddown through all the lower bits.  We can do this very quickly withoutany branch instructions.@@<Module definitions@@>=#if SIZEOF_INT==8# define copy_1_down(X) @@t}\3{-5@@>\	(((X) |= (X)>>1), @@t}\3{-5@@>\	 ((X) |= (X)>>2), @@t}\3{-5@@>\	 ((X) |= (X)>>4), @@t}\3{-5@@>\	 ((X) |= (X)>>8), @@t}\3{-5@@>\	 ((X) |= (X)>>16), @@t}\3{-5@@>\	 ((X) |= (X)>>32) )#else /* |!SIZEOF_INT==8|.  Then hopefully integers are at most 4 bytes wide. */# define copy_1_down(X) @@t}\3{-5@@>\	(((X) |= (X)>>1), @@t}\3{-5@@>\	 ((X) |= (X)>>2), @@t}\3{-5@@>\	 ((X) |= (X)>>4), @@t}\3{-5@@>\	 ((X) |= (X)>>8), @@t}\3{-5@@>\	 ((X) |= (X)>>16) )#endif@@Array |pow_2| stores powers of two.@@<Module variables@@>=static const int pow_2[]={@@t}\3{-5@@>	0x1,	0x2,	0x4,	0x8,@@t}\3{-5@@>	0x10,	0x20,	0x40,	0x80,@@t}\3{-5@@>	0x100,	0x200,	0x400,	0x800,@@t}\3{-5@@>	0x1000,	0x2000,	0x4000,	0x8000,@@t}\3{-5@@>	0x10000,	0x20000,	0x40000,	0x80000,@@t}\3{-5@@>	0x100000,	0x200000,	0x400000,	0x800000,@@t}\3{-5@@>	0x1000000,	0x2000000,	0x4000000,	0x8000000,@@t}\3{-5@@>	0x10000000,	0x20000000,	0x40000000,	0x80000000@@t}\3{-5@@>#if SIZEOF_INT==8	,	0x100000000,	0x200000000,	0x400000000,	0x800000000,@@t}\3{-5@@>	0x1000000000,	0x2000000000,	0x4000000000,	0x8000000000,@@t}\3{-5@@>	0x10000000000,	0x20000000000,	0x40000000000,	0x80000000000,@@t}\3{-5@@>	0x100000000000,	0x200000000000,	0x400000000000,	0x800000000000,@@t}\3{-5@@>	0x1000000000000,	0x2000000000000,	0x4000000000000,	0x8000000000000,@@t}\3{-5@@>	0x10000000000000,	0x20000000000000,	0x40000000000000,	0x80000000000000,@@t}\3{-5@@>	0x100000000000000,	0x200000000000000,	0x400000000000000,	0x800000000000000,@@t}\3{-5@@>	0x1000000000000000,	0x2000000000000000,	0x4000000000000000,	0x8000000000000000#endif};@@ It will be useful to find logarithms of small numbers quickly.@@<Module variables@@>=static const int floor_log_2_small[16]= {	/* dummy */@@+0, @@#	0, @@#	1, 1, @@#	2, 2, 2, 2, @@#	3, 3, 3, 3,	3, 3, 3, 3};@@The first bit manipulationroutine I'll define is |floor_log_2(x)|, which computes $\lfloor \log_2 (x) \rfloor$, as you might imagine.  Here I assumethat the argument $x$ is a positive integer.  We use a binary search on the bits to find the highest 1 bit.Let the \term{grey width} be the number of bits of $x$ that might be set to 1, counting from the least significant bit.Initially, the grey width of $x$ is the word size.  Eachiteration cuts grey width by half.  When $x$ is known to be small,we just use table lookup.@@^grey width@@>@@^binary search@@>The loop is unrolled as the body of a |switch| statement; we dependon each case falling through to the next.  Unrolling loops is as old as the sun (Sun?), but unrolling a binary search in this way is pretty neat.I first saw this ``unrolling a binary search'' trick in Knuth's book  {\sl Literate Programming}.  It is also reminscent of ``Duff's device'';see the \CEE/ Language FAQ, found on {\tt comp.lang.c}.The table lookup at the end saves a couple ofpossible branches at the cost of a possible cache miss.  Is it worth it?  %That trick is inspired by the well-known meta-transformation%of $O(\log n)$ parallel algorithms with $n$ processors into an $O(\log n)$%algorithm on $n/\log n$ processors.% log log n / log log log n ?As for all the shifting, the shifting in the |if| conditions are constantexpressions and so will be precomputed by the compiler.  The shiftsof variable |x| are by powers of two, and so benefit from fast barrelshifters in modern processors.@@^barrel shifters@@>@@<Module subroutines@@>=#if SIZEOF_INT>8#error "The bit twiddling handles integers of at most 64 bits."#endifstatic inline intfloor_log_2(unsigned int x) {	int ans=0;#if SIZEOF_UNSIGNED_INT==8	if ( x & (0xffffffff<<32) ) ans += 32, x>>=32;#endif	if ( x & (0xffff<<16) ) ans += 16, x>>=16;	if ( x & (0xff<<8) ) ans += 8, x>>=8;	if ( x & (0xf<<4) ) ans += 4, x>>=4;	ans += floor_log_2_small[x];	return ans;}@@ We will be masking out portions of words.Macro |lo_mask| lets us keep only the bottom bits of a word:|w&lo_mask(i)| is the bottom $i$ bits of $w$.Macro |hi_mask| lets us discard the bottom bits of a word:  |w & hi_mask(i)| is all but the bottom $i$ bits of $w$.% Although the code in this module is not entirely independent of% word-width, we parameterize some of it by defining the macro% |word_width| to be the number of bits in a machine word.% at d word_width (sizeof(unsigned int)*8)% at d all_ones ((unsigned int)(-1)) /* Assume two's complement arithmetic? */@@d lo_mask(X) (pow_2[(X)]-1)@@d hi_mask(X) (~lo_mask(i))@@*Least common ancestors.Now we're ready to get to the meat of the module: least common ancestors.In a rooted tree, vertex $u$ is an \term{ancestor} of vertex $v$ if $u$ appearson the path from $u$ to the root;  we also say that $v$ is a\term{descendant} of$u$.  Equivalently, $u$ is an ancestor of $v$ if $v$ appears in the subtreerooted at $u$.Note that each node is its ownancestor and its own descendant.  Any two vertices in a rooted tree have a non-empty set of commonancestors; in fact, the root is an ancestor of every node.  The \term{least common ancestor} of $u$ and $v$, denoted by $\LCA(u,v)$, is that ancestorof both $u$ and $v$ which is a descendant of all the ancestorsof common to both $u$ and $v$.  In other words, $\LCA(u,v)$ is the rootof the smallest subtree in which both $u$ and $v$ appear.@@^ancestor@@>@@^descendant@@>@@^least common ancestor@@>@@ Now that we've fixed our nomenclature, let's think about algorithms tocompute least common ancestors.  A simple algorithm to find $\LCA(u,v)$would be to write down the list of vertices on the paths from both $u$ and $v$ to the root and to find their convergencepoint.  In the worst case this would take time proportional to the depthof the tree, which could be very bad.  One might envision a scheme where we could digest thetree into a corresponding balanced tree and run the simple algorithmon the balanced tree.  But this would still run in logarithmic time.It turns out that the predigestion idea can be worked into a scheme sothat the \LCA\ queries can be answered in constant time. (``Constant time''assumes a fixed word size.  Some people might complain about this.For machine words with $w$ bits, the $\LCA$ query algorithm presentedhere runs in $O(\log w)$ time.)  The first known algorithm to achieveconstant-time queries is presented in D.~Harel and R.~E.~Tarjan, ``Fast algorithms for finding nearest commonancestors.  {\sl SIAM Journal on Computing}, {\bf 13}(2):338--355, May1984.  Here I use a simpler algorithm, a serial version of a parallelalgorithm: see B.~Schieber and U.~Vishkin ``On finding lowest commonancestors: simplification and parallelization.  {\sl SIAM Journal onComputing}, {\bf 17}(6):1253--1262, December 1988, and also Chapter 6,``Parallel Lowest Common Ancestor Computation'' in the book{\sl The Synthesis of Parallel Algorithms}, (INSERT PUBLISHER), 1993,edited by J.~Reif.  I've used this later paper as the basis for thisimplementation.@@ The tree is digested into three integer arrays indexed by vertex number:|level|, |inlabel|, and |ascendant|.  We attempt to reduce cache misses whenprocessing \LCA\ queries by grouping these values for each vertex in a single structure, |digest_t|.I also include a |cost| field in the digest structure because Ithrow |T_prime| away when not debugging.See also the |head| numbers below.@@^head numbers@@>@@^level numbers@@>@@^inlabel numbers@@>@@^ascendant numbers@@>@@<Module types@@>=typedef struct {	int level, inlabel, ascendant;	length_t cost;} digest_t;@@@@<Module variables@@>=static digest_t *digest;@@  The first |n| correspond to vertices of the original graph,and therefore have zero cost.  We initialize those fields at allocationtime instead of processing time because they are a function of|n| only.@@<Other setup code@@>=digest = new_arr_of(digest_t,n+n-1);

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -