⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 gb_io.w

📁 模拟器提供了一个简单易用的平台
💻 W
📖 第 1 页 / 共 2 页
字号:
% This file is part of the Stanford GraphBase (c) Stanford University 1993@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!\def\title{GB\_\,IO}@* Introduction. This is {\sc GB\_\,IO}, the input/output module usedby all GraphBase routines to access data~files. It doesn't actually doany output; but somehow `input/output' sounds like a more useful titlethan just `input'.All files of GraphBase data are designed to produce identical results onalmost all existing computers and operating systems. Each line of each filecontains at most 79 characters. Each character is either a blank or adigit or an uppercase letter or a lowercase letter or a standard punctuationmark. Blank characters at the end of each line are ``invisible''; that is,they have no perceivable effect. Hence identical results will be obtained onrecord-oriented systems that pad every line with blanks.The data is carefully sum-checked so that defective input files have littlechance of being accepted.@ Changes might be needed when these routines are ported to differentsystems. Sections of the program that are most likely to require such changesare listed under `system dependencies' in the index.A validation program is provided so that installers can tell if {\sc GB\_\,IO}is working properly. To make the test, simply run \.{test\_io}.@(test_io.c@>=#include "gb_io.h"  /* all users of {\sc GB\_\,IO} should include this header file */#define exit_test(m) /* we invoke this macro if something goes wrong */\ {@+fprintf(stderr,"%s!\n(Error code = %ld)\n",m,io_errors);@+return -1;@+}@t\2@>@/int main(){  @<Test the |gb_open| routine; exit if there's trouble@>;  @<Test the sample data lines; exit if there's trouble@>;  @<Test the |gb_close| routine; exit if there's trouble@>;  printf("OK, the gb_io routines seem to work!\n");  return 0;}@ The external variable |io_errors| mentioned in the previous sectionwill be set nonzero if any anomalies are detected. Errors won't occurin normal use of GraphBase programs, so no attempt has been made toprovide a user-friendly way to decode the nonzero values that|io_errors| might assume.  Information is simply gathered in binaryform; system wizards who might need to do a bit of troubleshootingshould be able to decode |io_errors| without great pain.@d cant_open_file 0x1 /* bit set in |io_errors| if |fopen| fails */@d cant_close_file 0x2 /* bit set if |fclose| fails */@d bad_first_line 0x4 /* bit set if the data file's first line isn't legit */@d bad_second_line 0x8 /* bit set if the second line doesn't pass muster */@d bad_third_line 0x10 /* bit set if the third line is awry */@d bad_fourth_line 0x20 /* guess when this bit is set */@d file_ended_prematurely 0x40 /* bit set if |fgets| fails */@d missing_newline 0x80 /* bit set if line is too long or |'\n'| is missing */@d wrong_number_of_lines 0x100 /* bit set if the line count is wrong */@d wrong_checksum 0x200 /* bit set if the checksum is wrong */@d no_file_open 0x400 /* bit set if user tries to close an unopened file */@d bad_last_line 0x800 /* bit set if final line has incorrect form */@ The \CEE/ code for {\sc GB\_\,IO} doesn't have a main routine; it's just abunch of subroutines to be incorporated into programs at a higher levelvia the system loading routine. Here is the general outline of \.{gb\_io.c}:@p@<Header files to include@>@;@h@<External declarations@>@;@<Private declarations@>@;@<Internal functions@>@;@<External functions@>@ Every external variable is declared twice in this \.{CWEB} file:once for {\sc GB\_\,IO} itself (the ``real'' declaration for storageallocation purposes) and once in \.{gb\_io.h} (for cross-referencesby {\sc GB\_\,IO} users).@<External declarations@>=long io_errors; /* record of anomalies noted by {\sc GB\_\,IO} routines */@ @(gb_io.h@>=@<Header...@>@;extern long io_errors; /* record of anomalies noted by {\sc GB\_\,IO} routines */@ We will stick to standard \CEE/-type input conventions. We'll also haveoccasion to use some of the standard string operations.@<Header...@>=#include <stdio.h>#ifdef SYSV#include <string.h>#else#include <strings.h>#endif@* Inputting a line. The {\sc GB\_\,IO} routines get their input froman array called |buffer|. This array is internal to {\scGB\_\,IO}---its contents are hidden from user programs. We make it 81characters long, since the data is supposed to have at most 79characters per line, followed by newline and null.@<Private...@>=static char buffer[81]; /* the current line of input */static char *cur_pos=buffer; /* the current character of interest */static FILE *cur_file; /* current file, or |NULL| if none is open */@ Here's a basic subroutine to fill the |buffer|. The main feature of interestis the removal of trailing blanks. We assume that |cur_file| is open.Notice that a line of 79 characters (followed by |'\n'|) will just fit intothe buffer, and will cause no errors. A line of 80 characters willbe split into two lines and the |missing_newline|message will occur, because of the way |fgets| is defined. A |missing_newline|error will also occur if the file ends in the middle of a line, or ifa null character (|'\0'|) occurs within a line.@<Internal...@>=static void fill_buf(){@+register char *p;  if (!fgets(buffer,sizeof(buffer),cur_file)) {    io_errors |= file_ended_prematurely; buffer[0]=more_data=0;  }  for (p=buffer; *p; p++) ; /* advance to first null character */  if (p--==buffer || *p!='\n') {    io_errors |= missing_newline; p++;  }  while (--p>=buffer && *p==' ') ; /* move back over trailing blanks */  *++p='\n'; *++p=0; /* newline and null are always present at end of line */  cur_pos=buffer; /* get ready to read |buffer[0]| */}@* Checksums. Each data file has a ``magic number,'' which is defined to be$$\biggl(\sum_l 2^l c_l\biggr) \bmod p\,.$$Here $p$ is a large prime number, and $c_l$ denotes the internal codecorresponding to the $l$th-from-lastdata character read (including newlines but not nulls).The ``internal codes'' $c_l$ are computed in a system-independent way:Each character |c| in the actual encoding scheme being used has acorresponding |icode|, which is the same on all systems. For example,the |icode| of |'0'| is zero, regardless of whether |'0'| is actuallyrepresented in ASCII or EBCDIC or some other scheme. (We assume thatevery modern computer system is capable of printing at least 95different characters, including a blank space.)We will accept a data file as error-free if it has the correct number oflines and ends with the proper magic number.@<Private...@>=static char icode[256]; /* mapping of characters to internal codes */static long checksum_prime=(1L<<30)-83;  /* large prime such that $2p+|unexpected_char|$ won't overflow */static long magic; /* current checksum value */static long line_no; /* current line number in file */static long final_magic; /* desired final magic number */static long tot_lines; /* total number of data lines */static char more_data; /* is there data still waiting to be read? */@ The |icode| mapping is defined by a single string, |imap|, such thatcharacter |imap[k]| has |icode| value~|k|. There are 96 charactersin |imap|, namely the 94 standard visible ASCII codes plus spaceand newline. If EBCDIC code is used instead of ASCII, thecents sign \rlap{\.{\kern.05em/}}\.c should take the place of single-left-quote\.{\char`\`}, and \.{\char5}~should take the place of\/~\.{\char`\~}.All characters that don't appear in |imap| are given the same |icode|value, called |unexpected_char|. Such characters should be avoided inGraphBase files whenever possible. (If they do appear, they can stillget into a user's data, but we don't distinguish them from each otherfor checksumming purposes.)The |icode| table actually plays a dual role, because we've rigged it so thatcodes 0--15 come from the characters |"0123456789ABCDEF"|. This facilitatesconversion of decimal and hexadecimal data. We can also use it forradices higher than 16.@d unexpected_char 127 /* default |icode| value */@<Private...@>=static char *imap="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\abcdefghijklmnopqrstuvwxyz_^~&@@,;.:?!%#$+-*/|\\<=>()[]{}`'\" \n";@ Users of {\sc GB\_\,IO} can look at the |imap|, but they can't change it.@<External fun...@>=char imap_chr(d)  long d;{  return d<0 || d>strlen(imap)? '\0': imap[d];}@#long imap_ord(c)  char c;{  @<Make sure that |icode| has been initialized@>;  return (c<0||c>255)? unexpected_char: icode[c];}@ @(gb_io.h@>=#define unexpected_char @t\quad@> 127extern char imap_chr(); /* the character that maps to a given character */extern long imap_ord(); /* the ordinal number of a given character */@ @<Make sure that |icode| has been initialized@>=if (!icode['1']) icode_setup();@ @<Internal...@>=static void icode_setup(){@+register long k;  register char *p;  for (k=0;k<256;k++) icode[k]=unexpected_char;  for (p=imap,k=0; *p; p++,k++) icode[*p]=k;}@ Now we're ready to specify some external subroutines that doinput.  Calling |gb_newline()| will read the next line ofdata into |buffer| and update the magic number accordingly.@(gb_io.h@>=extern void gb_newline(); /* advance to next line of the data file */extern long new_checksum(); /* compute change in magic number */@ Users can compute checksums as |gb_newline| does, but they can'tchange the (private) value of |magic|.@<External f...@>=long new_checksum(s,old_checksum)  char *s; /* a string */  long old_checksum;{@+register long a=old_checksum;  register char*p;  for (p=s; *p; p++)    a=(a+a+imap_ord(*p)) % checksum_prime;  return a;}@ The magic checksum is not affected by lines that begin with \.*.@<External f...@>=void gb_newline(){  if (++line_no>tot_lines) more_data=0;  if (more_data) {    fill_buf();    if (buffer[0]!='*')      magic=new_checksum(buffer,magic);  }}@ Another simple routine allows a user to read (but not write) thevariable |more_data|.@(gb_io.h@>=extern long gb_eof(); /* has the data all been read? */@ @<External f...@>=long gb_eof() { return !more_data; }@* Parsing a line. The user can input characters from the buffer in severalways. First, there's a basic |gb_char()| routine, which returnsa single character. The character is |'\n'| if the last character on theline has already been read (and it continues to be |'\n'| until the user calls|gb_newline|).The current position in the line, |cur_pos|, always advances when |gb_char|is called, unless |cur_pos| was already at the end of the line.There's also a |gb_backup()| routine, which moves |cur_pos| one placeto the left unless it was already at the beginning.@(gb_io.h@>=extern char gb_char(); /* get next character of current line, or |'\n'| */extern void gb_backup(); /* move back ready to scan a character again */@ @<External f...@>=char gb_char(){  if (*cur_pos) return (*cur_pos++);  return '\n';}@#void gb_backup(){  if (cur_pos>buffer)    cur_pos--;}@ There are two ways to read numerical data. The first, |gb_digit(d)|,expects to read a single character in radix~|d|, using |icode| values

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -