📄 refer
字号:
program processes about 1000 English words per processor second.Unless the.I \-koption is used (and the input files are long enough forit to take effect)the output of.I mkey is comparable in size to its input..PP.BHash and invert..RThe.I invprogram computes the hash codes and writesthe inverted files.It reads the output of.I mkeyand writes the set of files described earlierin this section.It expects one argument, which is used as the base name forthe three (or four) files to be written.Assuming an argument of.I Index(the default)the entry file is named.I Index.ia ,the posting file.I Index.ib ,the tag file.I Index.ic ,and the key file (if present).I Index.id .The.I invprogram recognizes the following options:.TScenter;lB lw(4i).\-a T{Append the new keys to a previous set of inverted files,making new files if there is no old set using the same base name.T}\-d T{Write the optional key file.This is needed when you can not check for false drops by lookingfor the keys in the original inputs, i.e. when the key derivationprocedure is complicated andthe output keys are not words from the input files.T}\-h\f2n T{The hash table size is.I n(default 997);.I nshould be prime.Making \f2n\f1 bigger saves search time and spends disk space.T}\-i[u] \f2name T{Take input from file.I name ,instead of the standard input;if.B uis present.I nameis unlinked when the sort is started.Using this option permits the sort scratch spaceto overlap the disk space used for input keys.T}\-n T{Make a completely new set of inverted files, ignoringprevious files.T}\-p T{Pipe into the sort program, rather than writing a temporaryinput file.This saves disk space and spends processor time.T}\-v T{Verbose mode; print a summary of the number of keys whichfinished indexing.T}.TE.PPAbout half the time used in.I invis in the contained sort.Assuming the sort is roughly linear, however,a guess at the total timing for.I invis 250 keys per second.The space used is usually of more importance:the entry file uses four bytes per possible hash (notethe.B \-hoption),and the tag file around 15-20 bytes per item indexed.Roughly, the posting file contains one item for each key instanceand one item for each possible hash code; the items are two byteslong if the tag file is less than 65336 bytes long, and theitems are four bytes wide if the tag file is greater than65536 bytes long.To minimize storage, the hash tables should beover-full;for most of the files indexed in this way, there is noother real choice, since the.I entryfile must fit in memory..PP.BSearching and Retrieving..RThe.I huntprogram retrieves items from an index.It combines, as mentioned above, the two parts of phase (B):search and delivery.The reason why it is efficient to combine delivery and searchis partly to avoid starting unnecessary processes, and partlybecause the delivery operation must be a part of the searchoperation in any case.Because of the hashing, the search part takes place in two stages:first items are retrieved which have the right hash codes associated with them,and then the actual items are inspected to determine false drops, i.e.to determine if anything with the right hash codes doesn't really have the rightkeys.Since the original item is retrieved to check on false drops,it is efficient to present it immediately, rather than onlygiving the tag as output and later retrieving theitem again.If there were a separate key file, this argument would not apply,but separate key files are not common..PPInput to.I huntis taken from the standard input,one query per line.Each query should be in.I "mkey \-s"output format;all lower case, no punctuation.The.I huntprogram takes one argument which specifies the base name of the indexfiles to be searched.Only one set of index files can be searched at a time,although many text files may be indexed as a group, of course.If one of the text files has been changed since the index, that fileis searched with.I fgrep;this may occasionally slow down the searching, and care should be taken toavoid having many out of date files.The following option arguments are recognized by.I hunt:.TScenter;lB lw(4i).\-a T{Give all output; ignore checking for false drops.T}\-C\f2n T{Coordination level.I n;retrieve items with not more than.I nterms of the input missing;default.I C0 ,implying that each search term must be in the output items.T}\-F[yn\f2d\f3\|] T{``\-Fy'' gives the text of all the items found;``\-Fn'' suppresses them.``\-F\f2d\|\f1'' where \f2d\f1\| is an integergives the text of the first \f2d\f1 items.The default is.I \-Fy.T}\-g T{Do not use.I fgrepto search files changed since the index was made;print an error comment instead.T}\-i \f2string T{Take.I stringas input, instead of reading the standard input.T}\-l \f2n T{The maximum length of internal lists of candidateitems is.I n;default 1000.T}\-o \f2string T{Put text output (``\-Fy'') in.I string;of use.I onlywheninvoked from another program.T}\-p T{Print hash code frequencies; mostlyfor use in optimizing hash table sizes.T}\-T[yn\f2d\|\f3] T{``\-Ty'' gives the tags of the items found;``\-Tn'' suppresses them.``\-T\f2d\f1\|'' where \f2d\f1\| is an integergives the first \f2d\f1 tags.The default is.I \-Tn .T}\-t \f2string T{Put tag output (``\-Ty'') in.I string;of use.I onlywhen invoked from another program.T}.TE.PPThe timing of.I huntis complex.Normally the hash table is overfull, so that there willbe many false drops on any single term;but a multi-term query will have few false drops onall terms.Thus if a query is underspecified (one search term)many potential items will be examined and discarded as falsedrops, wasting time.If the query is overspecified (a dozen search terms)many keys will be examined only to verify thatthe single item under consideration has that key posted.The variation of search time with number of keys isshown in the table below.Queries of varying length were constructed to retrievea particular document from the file of references.In the sequence to the left, search terms were chosen so asto select the desired paper as quickly as possible.In the sequence on the right, terms were chosen inefficiently,so that the query did not uniquely select the desired documentuntil four keys had been used.The same document was the target in each case,and the final set of eight keys are also identical; the differencesat five, six and seven keys are produced by measurement error, notby the slightly different key lists..TScenter;c s s s5 | c s s scp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8n n n n | n n n n .Efficient Keys Inefficient KeysNo. keys Total drops Retrieved Search time No. keys Total drops Retrieved Search time (incl. false) Documents (seconds) (incl. false) Documents (seconds)1 15 3 1.27 1 68 55 5.962 1 1 0.11 2 29 29 2.723 1 1 0.14 3 8 8 0.954 1 1 0.17 4 1 1 0.185 1 1 0.19 5 1 1 0.216 1 1 0.23 6 1 1 0.227 1 1 0.27 7 1 1 0.268 1 1 0.29 8 1 1 0.29.TEAs would be expected, the optimal search is achievedwhen the query just specifies the answer; however,overspecification is quite cheap.Roughly, the time required by.I huntcan be approximated as30 milliseconds per search key plus 75 millisecondsper dropped document (whether it is a false drop ora real answer).In general, overspecification can be recommended;it protects the user against additions to the data basewhich turn previously uniquely-answered queriesinto ambiguous queries..PPThe careful reader will have noted an enormous discrepancy between these timesand the earlier quoted time of around 1.9 seconds for a search. The timeshere are purely for the search and retrieval: they are measured byrunning many searches through a single invocation of the.I huntprogram alone.Usually, the UNIX command processor (the shell) must start boththe.I mkeyand.I huntprocesses for each query, and arrange for the output of.I mkeyto be fed tothe.I huntprogram.This adds a fixed overhead of about 1.7 secondsof processor timeto any single search.Furthermore, remember that all these times are processor times:on a typical morning on our \s-2PDP\s0 11/70 system, with about one dozenpeople logged on,to obtain 1 second of processor time for the search programtook between 2 and 12 seconds of real time, with a median of3.9 seconds and a mean of 4.8 seconds.Thus, although the work involved in a single search may be only200 milliseconds, after you add the 1.7 seconds of startup processortimeand then assume a 4:1 elapsed/processor timeratio, it will be 8 seconds before any response is printed..NHSelecting and Formatting References for T\s-2ROFF\s0.PPThe major application of the retrieval softwareis.I refer,which is a.I troffpreprocessorlike.I eqn ..[kernighan cherry acm 1975.]It scans its input looking for items of the form.DS\*.[imprecise citation\*.\^].DEwhere an imprecise citation is merely a stringof words found in the relevant bibliographic citation.This is translated into a properly formatted reference.If the imprecise citation does not correctly identifya single paper(eitherselecting no papers or too many) a message is given.The data base of citations searched may be tailored to eachsystem, and individual users may specify their owncitationfiles.On our system, the default data base is accumulated fromthe publication lists of the members of our organization, plusabout half a dozen personal bibliographies that were collected.The present total is about 4300 citations, but this increases steadily.Even now,the data base covers a large fraction of local citations..PPFor example, the reference for the.I eqnpaper above was specified as.DS\&\*.\*.\*.\&preprocessor like\&.I eqn.\&.[\&kernighan cherry acm 1975\&.]\&It scans its input looking for items\&\*.\*.\*..DEThis paper was itself printed using.I refer.The above input text was processed by.I referas well as.I tbland.I troffby the command.DS.ft Irefer memo-file | tbl | troff \-ms.ft R.DEand the reference was automatically translated into a correctcitation to the ACM paper on mathematical typesetting..PPThe procedure to use to place a reference in a paperusing.I referis as follows.First, use the.I lookbibcommand to check that the paper is in the data baseand to find out what keys are necessary to retrieve it.This is done by typing.I lookbiband then typing some potential queries untila suitable query is found.For example, had one started to findthe.I eqnpaper shown above by presenting the query.DS $ lookbib kernighan cherry (EOT).DE.I lookbibwould have found several items; experimentation would quicklyhave shown that the query given above is adequate.Overspecifying the query is of course harmless; it is even desirable,since it decreases the risk that a document added to the publicationdata base in the future will be retrieved in addition to theintended document.The extra time taken by even a grossly overspecified query isquite small.A particularly careful reader may have noticed that ``acm'' does notappear in the printed citation;we have supplemented some of the data base items withextra keywords, such as common abbreviations for journalsor other sources, to aid in searching..PPIf the reference is in the data base, the querythat retrieved it can be inserted in the text,between.B \*.[and .B \*.\^]brackets.If it is not in the data base, it can be typedinto a private file of references, using the formatdiscussed in the next section, and thenthe.B \-poptionused to search this private file.Such a command might read(if the private references are called.I myfile ).DS.ft 2refer \-p myfile document | tbl | eqn | troff \-ms \*. \*. \*..ft 1.DEwhere.I tbland/or.I eqncould be omitted if not needed.The useof the.I \-msmacros.[lesk typing documents unix gcos.]or some other macro package, however,is essential..I Referonly generates the data for the references; exact formattingis done by some macro package, and if none is supplied thereferences will not be printed..PPBy default,the references are numbered sequentially,andthe.I \-msmacros format references as footnotes at the bottom of the page.This memorandum is an example of that style.Other possibilities are discussed in section 5 below..NHReference Files..PPA reference file is a set of bibliographic references usable with.I refer.It can be indexed using the software described in section 2for fast searching.What.I referdoes is to read the input document stream,looking for imprecise citation references.It then searches through reference files to findthe full citations, and inserts them into thedocument.The format of the full citation is arranged to make itconvenient for a macro package, such as the.I \-msmacros, to format the referencefor printing.Sincethe format of the final reference is determinedby the desired style of output,which is determined by the macros used,.I referavoids forcing any kind of reference appearance.All it does is define a set of string registers whichcontain the basic information about the reference;and provide a macro call which is expanded by the macropackage to format the reference.It is the responsibility of the final macro packageto see that the reference is actually printed; if nomacros are used, and the output of.I referfed untranslated to.I troff,nothing at all will be printed..PPThe strings defined by.I referare taken directly from the files of references, whichare in the following format.The references should be separatedby blank lines.Each reference is a sequence of lines beginning with.B %and followedby a key-letter.The remainder of that line, and successive lines until the next line beginningwith.B % ,contain the information specified by the key-letter.In general,.I referdoes not interpret the information, but merely presentsit to the macro package for final formatting.A user with a separate macro package, for example,can add new key-letters or use the existing ones for other purposeswithout bothering.I refer..PPThe meaning of the key-letters given below, in particular,is that assigned by the.I \-msmacros.Not all information, obviously, is used with each citation.For example, if a document is both an internal memorandum and a journal article,the macros ignore the memorandum version and cite only the journal article.Some kinds of information are not used at all in printing the reference;if a user does not like finding references by specifying titleor author keywords, and prefers to add specific keywords to the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -