⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 wp2x.c

📁 harvest是一个下载html网页得机器人
💻 C
📖 第 1 页 / 共 3 页
字号:
/* wp2x.c,v 1.9 1994/03/26 19:28:43 mcr Exp *//* Before compiling, read the section titled `portability concerns'. *//************************************************************************ * wp2x.c,v * Revision 1.9  1994/03/26  19:28:43  mcr * Hacker's guide added. * getopt routines included * Regression test outputs automated. * * Revision 1.8  1994/02/13  22:18:35  mcr * Fixed bug in -v option * * Revision 1.7  1994/02/13  22:13:41  mcr * Reorganized wp2x files into sub-directories. Updated man page. * * Revision 1.6  1994/01/06  14:32:28  mcr * Checkin of patch1 * * Revision 1.5  1993/09/07  18:19:21  mcr * Fixed html.cfg to ignore soft hyphens. * Fixed wp2x.c to deal with options correctly. * * Revision 1.4  1993/09/05  20:29:17  mcr * First alpha at 5.1 compatible wp2x. * * Revision 1.3  1993/09/03  23:09:37  mcr * Hacking on files. * * Revision 1.2  1993/08/27  14:50:17  mcr * More hacks to html.cfg. * * Revision 1.1.1.1  1993/08/23  19:09:21  mcr * Local hacks included * * Revision 1.10  91/08/18  15:05:41  raymond * Descriptor file stuff. * * Revision 1.9  91/08/06  09:08:09  raymond * add missing `break' in check_arity * * Revision 1.8  91/08/06  08:31:21  raymond * Avoid infinite loop if file is corrupted. * Better error-checking on configuration file (new output scheme). * * Revision 1.7  91/08/02  13:35:37  raymond * Epsilonically better handling of environments that didn't end properly. * Change return type of main() to keep gcc quiet. * MSC support. * * Revision 1.6  91/07/28  21:08:53  raymond * BeginTabs et al, FNote#, ENote#, NegateTotal, more unsupported codes * Improve character tokens, Header, Footer * Take care when people don't end lines with HRt * Fix major bugs in endnote processing, footnote numbering (and nobody *    noticed!) * More worries about signed characters. * * Revision 1.5  91/07/23  22:59:43  raymond * Add COMMENT token, and some bug fixes. * * Revision 1.4  91/07/23  22:09:23  raymond * Concessions to slightly non-ANSI compilers. (`const', `unsigned char') * More patches for machines with signed characters. * Fix blatant bug in hex constants.  (Amazed nobody noticed.) * New tags SetFn#, Header, Footer. * Warning messages for unsupported tokens. * Backslahes processed in character tags. * Fixed(?) footnotes, endnotes, page length changes. * Inserted missing `break's into the huge switch. * * Revision 1.3  91/07/12  15:39:44  raymond * Spiffy Turbo C support. * Some <stdlib.h>'s don't declare errno et al. * Command line switches `-s' and `-n' added. * More cute warning messages. * Dots periodically emitted. * Give the enum of token types a name, to placate QuickC. * Fix problems with pitch changes and signed characters. * * Revision 1.2  91/06/22  08:18:22  raymond * <process.h> and fputchar() aren't sufficiently portable. * strerror() fails to exist on some so-called ANSI platforms. * Removed assumption that characters are unsigned. * Forgot to #include <stdarg.h> * *//************************************************************************ * PORTABILITY CONCERNS ************************************************************************ * * If possible, compile with unsigned characters.  (Though I think * I've taken care of all the places where I assumed characters are * unsigned.) * * This program assumes that your compiler is fully ANSI-conformant. * Depending on how non-conformant your compiler is, you may need to * set the following symbols at compile time: * * NO_CONST -- set this if your compiler does not know what `const' means. * Cdecl    -- how to tag functions that are variadic. * * Cdecl is used if you need special declarations for variadic functions. * This is used by IBM PC compilers so that you can make the default * parameter passing Pascal-style or Fastcalls. * * Some very machine-dependent stuff happens when trying to open the * descriptor file.  Please read dopen.c as well. */#ifdef NO_CONST#define const#endif#ifndef Cdecl                       /* default is nothing */#define Cdecl#endif/************************************************************************ * This program divides naturally into two parts. * * The first part reads in the descriptor file and builds the expansions * for each of the identifiers listed above. * This is the easy part. * * The second part reads the input file and uses the expansions collected * in the first part to transform the file into the output. * This is the hard part. * ************************************************************************/#include "config.h"/* And now, the code. * We start off with some obvious header files. */#include <stdio.h>#include <stdarg.h>#include <stdlib.h>#include <string.h>#include <ctype.h>#include <sys/types.h>#include <unistd.h>#include <netinet/in.h>#include <assert.h>#include "tokens.h"#include "patchlevel.h"#include <errno.h>/************************************************************************//* Some common idioms                                                   *//************************************************************************/#define do_nothing /* twiddle thumbs *//************************************************************************//* Blowing up                                                           *//************************************************************************//* The function "error" accepts two arguments.  A FILE pointer and * a printf-style argument list.  The printf-style arguments are * printed to stderr.  If the FILE is non-NULL, the the remaining * contents of the file are printed as well (to provide context), up * to 80 characters. */void Cdecl error(FILE *fp, char *fmt, ...){  int i;  va_list ap;  fputs("Error: ", stderr);  va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap);  fputc('\n', stderr);  if (fp) {    fprintf(stderr, "Unread text: ");    for (i = 0; i < 80 && !feof(fp); i++) fputc(getc(fp), stderr);    fputc('\n', stderr);  }  exit(1);}/************************************************************************//* Command-line switches                                                *//************************************************************************/int silent = 0;int blipinterval = 1024;                /* display blips every 1K */int blipcount;/************************************************************************//* Basic file manipulations                                             *//************************************************************************//* We here define a few basic functions.  Let us hope that the first * three functions' names are self-descriptive. */int next_non_whitespace(FILE *fp){  register int c;  while ((c = getc(fp)) != EOF && isspace(c)) do_nothing;  return c;}int next_non_space_or_tab(FILE *fp){  register int c;  while ((c = getc(fp)) != EOF && (c == ' ' || c == '\t')) do_nothing;  return c;}void eat_until_newline(FILE *fp){  register int c;  while ((c = getc(fp)) != EOF && c != '\n') do_nothing;}/* The function parse_hex grabs a (no-more-than-two-character) hex * constant.  Similarly, parse_octal does the same for octal constants. */int parse_hex(FILE *fp){  register int c, value;  if (!isxdigit(c = toupper(getc(fp))))    error(fp, "Expecting a hex digit");  if ((value = c - '0') > 9) value += '0' - 'A' + 10;  if (!isxdigit(c = getc(fp))) { ungetc(c, fp); return value; }  c = toupper(c);  value = (value << 4) + c - '0';  if (c > '9') value += '0' - 'A' + 10;  return value;}int parse_octal(FILE *fp, register int c){  register int value = c - '0';  if ( (c = getc(fp)) < '0' || c > '7') { ungetc(c, fp); return value; }  value = (value << 3) + c - '0';  if ( (c = getc(fp)) < '0' || c > '7') { ungetc(c, fp); return value; }  return (value << 3) + c - '0';}/************************************************************************//* Storing the input strings                                            *//************************************************************************//* The input strings are allocated from a large pool we set up at * startup.  This lets us do our thing without having to fight * with people like malloc and friends.  This method does limit * our configuration file to 32K, however.  We hope that this is * not a problem.  (It also means that the program can be translated * to almost any other language without too much difficulty.) * * Here's how it works. * * "pool" is an array of POOL_SIZE characters.  The value of POOL_SIZE * is flexible, but shouldn't exceed 65535, since that's the size of * an IBM PC segment.  If your configuration file is more than 64K, * then there's probably something wrong. * * "pool_ptr" points to the next character in "pool" that hasn't been * used for anything yet. * * "top_of_pool" points one character beyond the end of pool, so we can * see if we've run out of memory. * * When we want to put something into the pool, we simply store into "pool" * and increment "pool_ptr" appropriately. * * Access to these variables is done through the following functions, * implemented as macros. * * "anchor_string()" is called before you start throwing things into * the pool.  It returns a pointer to the beginning of the string * being built up. * * "add_to_string(c)" adds the character "c" to the string being built up. * * "finish_string()" gets ready for building a new string.  We check * that we did not overflow our pool.  We pull the sneaky trick of * a dummy else clause so that [1] "else"s match up properly if this * is nested inside an "if" statement, [2] the semicolon gets eaten * up correctly. * * "remove_string(s)" removes all strings from the one called "s" onwards. * */#define POOL_SIZE   32768Uchar pool[POOL_SIZE];char *pool_ptr = pool;#define top_of_pool (pool + POOL_SIZE)#define anchor_string() pool_ptr#define add_to_string(c) (*pool_ptr++ = c)#define finish_string() \     if (pool_ptr >= top_of_pool) error(NULL, "string pool overflow."); \     else do_nothing#define remove_string(s) (pool_ptr = s)char *expansion[LastToken];/************************************************************************//* Naming the identifiers                                               *//************************************************************************//* Extreme care must be taken to ensure that this list parallels the list * of token names above. */typedef struct identifier {    char *name;    int arity;} Identifier;Identifier names[] = {    { "typeout", 0 },    { "BEGIN", 0 },    { "END", 0 },    { "Comment", 0 },    { "comment", 0 },    { "PageNo", 0 },    { "RomanPage", 1 },    { "ArabicPage", 1 },    { "HSpace", 0 },    { "Tab", 0 },    { "BeginTabs", 0 },    { "SetTab", 1 },    { "SetTabCenter", 1 },    { "SetTabRight", 1 },    { "SetTabDecimal", 1 },    { "EndTabs", 0 },    { "HPg", 0 },    { "CondEOP", 1 },    { "HRt", 0 },    { "SRt", 0 },    { "-", 0 },        /* NHyph */    { "--", 0 },       /* NHyphE */    { "=", 0 },        /* HHyph */    { "\\-", 0 },      /* DHyph */    { "\\--", 0 },     /* DHyphE */    { "NoHyphWord", 0 },    { "Marg", 2 },    { "TopMarg", 1 },    { "PageLength", 1 },    { "SS", 0 },    { "DS", 0 },    { "1.5S", 0 },    /* OHS */    { "TS", 0 },    { "LS", 1 },    { "LPI", 1 },    { "Bold", 0 },    { "bold", 0 },    { "Und", 0 },    { "und", 0 },    { "DoubleUnd", 0 },    { "doubleund", 0 },    { "Red", 0 },    { "red", 0 },    { "Strike", 0 },    { "strike", 0 },    { "Rev", 0 },    { "rev", 0 },    { "Outline", 0},    { "outline", 0},    { "Fine", 0},    { "fine", 0},    { "Over", 0 },    { "over", 0 },    { "Sup", 0 },    { "sup", 0 },    { "Sub", 0 },    { "sub", 0 },    { "Large",0},    { "large",0},    { "Small",0},    { "small",0},    { "VeryLarge",0},    { "verylarge",0},    { "ExtraLarge",0},    { "extralarge",0},    { "Italics",0},    { "italics",0},    { "Shadow", 0},    { "shadow",0},    { "SmallCaps",0},    { "smallcaps",0},    { "UpHalfLine", 0 },    { "DownHalfLine", 0 },    { "AdvanceToHalfLine", 2 },    { "Indent", 0 },    { "DIndent", 0 },    { "indent", 0 },    { "dindent", 0 },    { "MarginRelease", 1 },    { "Center", 0 },

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -