📄 gb_books.w
字号:
% This file is part of the Stanford GraphBase (c) Stanford University 1993@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!@i gb_types.w\def\title{GB\_\,BOOKS}\def\<#1>{\hbox{$\langle$\rm#1$\rangle$}}\prerequisites{GB\_\,GRAPH}{GB\_\,IO}@* Introduction. This GraphBase module contains the |book|subroutine, which creates a family of undirected graphs that are based onclassic works of literature. It also contains the |bi_book|subroutine, which creates a related family of bipartite graphs.An example of the use of |book| can be found in the demonstrationprogram {\sc BOOK\_\kern.05emCOMPONENTS}.@(gb_books.h@>=extern Graph *book();extern Graph *bi_book();@ The subroutine call |book(@[@t\<title>@>@],n,x,first_chapter,last_chapter,in_weight,out_weight,seed)|constructs a graph based on the information in \<title>\.{.dat},where \<title> is either \.{"anna"} (for {\sl Anna Karenina\/}),\.{"david"} (for {\sl David Copperfield\/}),\.{"jean"} (for {\sl Les Mis\'erables\/}),\.{"huck"} (for {\sl Huckleberry Finn\/}), or\.{"homer"} (for {\sl The Iliad\/}).Each vertex of the graph corresponds to one of the characters in theselected book. Edges between vertices correspond to encounters betweenthose characters. The length of each edge is~1.Subsets of the book can be selected by specifying that the edge data should berestricted to chapters between |first_chapter| and |last_chapter|,inclusive. If |first_chapter=0|, the result is the same as if|first_chapter=1|. If |last_chapter=0| or if |last_chapter| exceedsthe total number of chapters in the book, the result is the same asif |last_chapter| were the number of the book's final chapter.The constructed graph will have $\min(n,N)-x$ vertices, where $N$ is thetotal number of characters in the selected book.However, if |n| is zero, |n| is automatically made equal to the maximumpossible value,~$N$. If |n| is less than~$N$, the |n-x| characters will beselected by assigning a weight to each character and choosing the |n| withlargest weight, then excluding the largest~|x| of these,using random numbers to break ties in case of equal weights.Weights are computed by the formula$$ |in_weight|\cdot\\{chapters\_in}+|out_weight|\cdot\\{chapters\_out}, $$where \\{chapters\_in} is the number of chapters between |first_chapter|and |last_chapter| in which a particular character appears, and\\{chapters\_out} is the number of other chapters in which thatcharacter appears. Both |in_weight| and |out_weight| must be at most1,000,000 in absolute value.Vertices of the graph will appear in order of decreasing weight.The |seed| parameter defines the pseudo-random numbers used wherevera ``random'' choice between equal-weight vertices needs to be made.As usual with GraphBase routines, different choices of |seed|will in general produce different selections,but in a system-independent manner; identical results will be obtained onall computers when identical parameters have been specified.Any |seed| value between 0 and $2^{31}-1$ is permissible.@ Examples: The call |book("anna",0,0,0,0,0,0,0)| will construct agraph on 138 vertices that represent all 138 characters of Tolstoy's{\sl Anna Karenina\/}, as recorded in \.{anna.dat}. Two vertices willbe adjacent if the corresponding charactersencounter each other anywhere in the book. The call|book("anna",50,0,0,0,1,1,0)| is similar, but it is restricted tothe 50 characters that occur most frequently, i.e., in the most chapters.The call |book("anna",50,0,10,120,1,1,0)| has the same vertices, but ithas edges only for encounters that take place between chapter~10and chapter~120, inclusive. The call |book("anna",50,0,10,120,1,0,0)| issimilar, but its vertices are the 50 characters that occur most often inchapters 10 through~120, without regard to how often they occur inthe rest of the book. The call |book("anna",50,0,10,120,0,0,0)| isalso similar, but it chooses 50 characters completely at random(possibly from those that don't occur in the selected chapters at all).Parameter |x|, which causes the |x| vertices of highest weight to beexcluded, is usually either 0 or~1. It is provided primarily so thatusers can set |x=1| with respect to {\sl David Copperfield\/} and {\slHuckleberry Finn}; those novels are narrated by their principalcharacter, so they have edges between the principal character andalmost everybody else. (Characters cannot get into the action of afirst-person account unless they encounter the narrator or unless thenarrator is quoting some other person's story.) The correspondinggraphs tend to have more interesting connectivity properties if weleave the narrator out by setting |x=1|. For example, there are 87characters in {\sl David Copperfield\/}; the call|book("david",0,1,0,0,1,1,0)| produces a graph with 86 vertices, onefor every character except David Copperfield himself.@ The subroutine call |bi_book(@[@t\<title>@>@],n,x,first_chapter,last_chapter,in_weight,out_weight,seed)| produces a bipartite graph in which thevertices of the first part are exactly the same as the vertices of thegraph returned by |book|, while the vertices of the second part arethe selected chapters. For example,$|bi_book|(|"anna"|,\allowbreak 50,0,10,120,1,1,0)$creates a bipartite graph with $50+111$ vertices. There is an edge betweeneach character and the chapters in which that character appears.@ Chapter numbering needs further explanation. {\sl Anna Karenina\/}has 239 chapters, which are numbered 1.1 through 8.19 in thework itself but renumbered 1 through 239 as far as the |book| routineis concerned. Thus, setting |first_chapter=10| and |last_chapter=120|turns out to be equivalent to selecting chapters 1.10 through 4.19(more precisely, chapter~10 of book~1 through chapter~19 of book~4).{\sl Les Mis\'erables\/} has an even more involved scheme; its356 chapters range from 1.1.1 (part~1, book~1, chapter~1) to5.9.6 (part~5, book~9, chapter~6). After |book| or |bi_book| has createda graph, the external integer variable |chapters| will contain the totalnumber of chapters, and |chap_name| will be an array of stringscontaining the structured chapter numbers. For example, after|book("jean",@[@t\dots@>@])|, we will have |chapters=356|,|chap_name[1]="1.1.1"|, \dots, |chap_name[356]="5.9.6"|;|chap_name[0]| will be~|""|.@d MAX_CHAPS 360 /* no book will have this many chapters */@<External variables@>=long chapters; /* the total number of chapters in the selected book */char *chap_name[MAX_CHAPS]={""}; /* string names of those chapters */@ As usual, we put declarations of the external variables into the header filefor users to {\bf include}.@(gb_books.h@>=extern long chapters; /* the total number of chapters in the selected book */extern char *chap_name[]; /* string names of those chapters */@ If the |book| or |bi_book| routine encounters a problem, itreturns |NULL| (\.{NULL}),after putting a code number into the external variable|panic_code|. This code number identifies the type of failure.Otherwise |book| returns a pointer to the newly created graph, whichwill be represented with the data structures explained in {\sc GB\_\,GRAPH}.(The external variable |panic_code| is itself defined in {\sc GB\_\,GRAPH}.)@d panic(c) @+{@+panic_code=c;@+gb_trouble_code=0;@+return NULL;@+}@#@f node long /* the \&{node} type is defined below */@ The \CEE/ file \.{gb\_books.c} has the overall shape shown here.It makes use of an internal subroutinecalled |bgraph|, which combines the work of |book| and |bi_book|.@p#include "gb_io.h" /* we will use the {\sc GB\_\,IO} routines for input */#include "gb_flip.h" /* we will use the {\sc GB\_\,FLIP} routines for random numbers */#include "gb_graph.h" /* we will use the {\sc GB\_\,GRAPH} data structures */#include "gb_sort.h" /* and the |gb_linksort| routine */@h@#@<Type declarations@>@;@<Private variables@>@;@<External variables@>@;@#static Graph *bgraph(bipartite, title,n,x,first_chapter,last_chapter,in_weight,out_weight,seed) long bipartite; /* should we make the graph bipartite? */ char *title; /* identification of the selected book */ unsigned long n; /* number of vertices desired before exclusion */ unsigned long x; /* number of vertices to exclude */ unsigned long first_chapter, last_chapter; /* interval of chapters leading to edges */ long in_weight; /* weight coefficient pertaining to chapters in that interval */ long out_weight; /* weight coefficient pertaining to chapters not in that interval */ long seed; /* random number seed */{@+@<Local variables@>@;@# gb_init_rand(seed); @<Check that the parameters are valid@>; @<Skim the data file, recording the characters and computing their statistics@>; @<Choose the vertices and put them into an empty graph@>; @<Read the data file more carefully and fill the graph as instructed@>; if (gb_trouble_code) { gb_recycle(new_graph); panic(alloc_fault); /* (expletive deleted) we ran out of memory somewhere back there */ } return new_graph;}@#Graph *book(title,n,x,first_chapter,last_chapter,in_weight,out_weight,seed) char *title; unsigned long n, x, first_chapter, last_chapter; long in_weight,out_weight,seed;{@+return bgraph(0L,title,n,x,first_chapter,last_chapter, in_weight,out_weight,seed);@+}Graph *bi_book(title,n,x,first_chapter,last_chapter,in_weight,out_weight,seed) char *title; unsigned long n, x, first_chapter, last_chapter; long in_weight,out_weight,seed;{@+return bgraph(1L,title,n,x,first_chapter,last_chapter, in_weight,out_weight,seed);@+}@ @<Local var...@>=Graph *new_graph; /* the graph constructed by |book| or |bi_book| */register long j,k; /* all-purpose indices */long characters; /* the total number of characters in the selected book */register node *p; /* information about the current character */@ @d MAX_CHARS 600 /* there won't be more characters than this */@<Check that the parameters are valid@>=if (n==0) n=MAX_CHARS;if (first_chapter==0) first_chapter=1;if (last_chapter==0) last_chapter=MAX_CHAPS;if (in_weight>1000000 || in_weight<-1000000 || out_weight>1000000 || out_weight<-1000000) panic(bad_specs); /* the magnitude of at least one weight is too big */sprintf(file_name,"%.6s.dat",title);if (gb_open(file_name)!=0) panic(early_data_fault); /* couldn't open the file; |io_errors| tells why */@ @<Priv...@>=static char file_name[]="xxxxxx.dat";@*Vertices.Each character in a book has been given a two-letter code name forinternal use. The code names are explained at the beginning of eachdata file by a number of lines that look like this:$$\hbox{\tt XX \<name>,\<description>}$$For example, here's one of the lines near the beginning of |"anna.dat"|:$$\hbox{\tt AL Alexey Alexandrovitch Karenin, minister of state}$$The \<name> does not contain a comma; the \<description> might.A blank line follows the cast of characters.Internally, we will think of the two-letter code as a radix-36 integer.Thus \.{AA} will be the number $10\times36+10$, and \.{ZZ} will be$35\times36+35$. The |gb_number| routine in {\sc GB\_\,IO} is set up toinput radix-36 integers just as it does hexadecimal ones.In {\sl The Iliad}, many of the minor characters have numeric digitsin their code names because the total number of characters is toolarge to permit mnemonic codes for everybody.@d MAX_CODE 1296 /* $36\times36$, the number of two-digit codes in radix 36 */@ In order to choose the vertices, we want to represent each characteras a node whose key corresponds to its weight; then the |gb_linksort|routine of {\sc GB\_\,SORT} will provide the desired rank-ordering. We willfind it convenient to use these nodes for all the data processing that|bgraph| has to do.@<Type dec...@>=typedef struct node_struct { /* records to be sorted by |gb_linksort| */ long key; /* the nonnegative sort key (weight plus $2^{30}$) */ struct node_struct *link; /* pointer to next record */ long code; /* code number of this character */ long in; /* number of occurrences in selected chapters */ long out; /* number of occurrences in unselected chapters */ long chap; /* seen most recently in this chapter */ Vertex *vert; /* vertex corresponding to this character */} node;@ Not only do nodes point to codes, we also want codes to point to nodes.@<Priv...@>=static node node_block[MAX_CHARS]; /* array of nodes for working storage */static node *xnode[MAX_CODE]; /* the node, if any, having a given code */@ We will read the data file twice, once quickly (to collect statistics)and once more thoroughly (to record detailed information). Here is thequick version.@<Skim the data file, recording the characters...@>=@<Read the character codes at the beginning of the data file, and prepare a node for each one@>;@<Skim the chapter information, counting the number of chapters in which each character appears@>;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -