📄 main.java

📁 JLex词法分析生成器
💻 JAVA
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
/**************************************************************  JLex: A Lexical Analyzer Generator for Java(TM)  Written by Elliot Berk <ejberk@cs.princeton.edu>. Copyright 1996.  Maintained by C. Scott Ananian <cananian@alumni.princeton.edu>.  See below for copyright notice, license, and disclaimer.  New releases from http://www.cs.princeton.edu/~appel/modern/java/JLex/  Version 1.2.5, 7/25/99-5/16/00, [C. Scott Ananian]   Stomped on one more 8-bit character bug.  Should work now (really!).   Added unicode support, including unicode escape sequences.   Rewrote internal JavaLexBitSet class as SparseBitSet for efficient     unicoding.   Added an NFA character class simplification pass for unicode efficiency.   Changed byte- and stream-oriented I/O routines to use characters and     java.io.Reader and java.io.Writer instead --- which means we read in     unicode specifications correctly and write out a proper unicode java     source file.  As a happy side-effect, the output java file is written     with your platform's preferred newline character(s).   Rewrote CInput to fix bugs with line-counting in the specification file     and "unusual behaviour" when the last line of the specification wasn't     terminated with a newline. Thanks to Matt Hanna <mhanna@cs.caltech.edu>     for pointing out the bug.   Fixed a bug that would cause JLex not to terminate given certain input     specifications.  Thanks to Mark Greenstreet <mrg@cs.ubc.ca> and     Frank B. Brokken <frank@suffix.icce.rug.nl> for reporting this.   CUP parser integration improved according to suggestions made by     David MacMahon <davidm@smartsc.com>.  The %cup directive now tells     JLex to generate a parser conforming to the java_cup.runtime.Scanner     interface; see manual for more details.   Fixed bug with null string literals ("") in regexps.  Reported by     Charles Fischer <fischer@cs.wisc.edu>.   Rewrote start-of-line and end-of-line handling, closing active bug #5.     Also fixed line-counting code, closing active bug #12.  All     new-line handling is now platform-independent.   Used unpackFromString more extensively to allow larger cmap, etc,     tables.  This helps unicode support work reliably.  It's also     prettier now if you happen to read the source to the generated     lexer.   Generated lexer now accepts unicode LS (U+2028) and PS (U+2029) as     line separators for strict unicode compliance; see     http://www.unicode.org/unicode/reports/tr18/   Fixed bug with character constants in action strings.  Reported by     Andrew Appel against 1.2.5b3.   Fixed bug with illegal \^C-style escape sequences.  Reported by     Toshiya Iwai <iwai@isdnet.co.jp> against 1.2.5b4.   Fixed "newline in quoted string" error when unpaired single- or     double-quotes were present in comments in the action phrase.     Reported by Stephen Ostermiller <1010JLex@ostermiller.com>     against 1.2.5b4.  Reported by Eric Esposito <eric.esposito@unh.edu>     against 1.2.4 and 1.2.5b2.   Fixed "newline in quoted string" error when /* or // appeared     in quoted strings in the action phrase.  Reported by     David Eichmann <david-eichmann@uiowa.edu> against 1.2.5b5.   Fixed 'illegal constant' errors in case statements caused by     Sun's JDK 1.3 more closely adhering to the Java Language     Specification.  Reported by a number of people, but      Harold Grovesteen <hgrovesteen@home.com> was the first to direct me to     a Sun bug report (4119776) which quoted the relevant section of the     JLS (15.27) to convince me that the JLex construction actually was     illegal.  Reported against 1.2.5b6, but this bit of code has been     present since the very first version of JLex (1.1.1).  Version 1.2.4, 7/24/99, [C. Scott Ananian]   Correct the parsing of '-' in character classes, closing active      bug #1.  Behaviour follows egrep: leading and trailing dashes in     a character class lose their special meaning, so [-+] and [+-] do     what you would expect them to.   New %ignorecase directive for generating case-insensitive lexers by     expanding matched character classes in a unicode-friendly way.   Handle unmatched braces in quoted strings or comments within     action code blocks.   Fixed input lexer to allow whitespace in character classes, closing     active bug #9.  Whitespace in quotes had been previously fixed.   Made Yylex.YYEOF and %yyeof work like the manual says they should.  Version 1.2.3, 6/26/97, [Raimondas Lencevicius]   Fixed the yy_nxt[][] assignment that has generated huge code   exceeding 64K method size limit. Now the assignment   is handled by unpacking a string encoding of integer array.   To achieve that, added   "private int [][] unpackFromString(int size1, int size2, String st)"   function and coded the yy_nxt[][] values into a string   by printing integers into a string and representing   integer sequences as "value:length" pairs.   Improvement: generated .java file reduced 2 times, .class file     reduced 6 times for sample grammar. No 64K errors.   Possible negatives: Some editors and OSs may not be able to handle      the huge one-line generated string. String unpacking may be slower     than direct array initialization.  Version 1.2.2, 10/24/97, [Martin Dirichs]  Notes:    Changed yy_instream to yy_reader of type BufferedReader. This reflects     the improvements in the JDK 1.1 concerning InputStreams. As a     consequence, changed yy_buffer from byte[] to char[].     The lexer can now be initialized with either an InputStream     or a Reader. A third, private constructor is called by the other     two to execute user specified constructor code.  Version 1.2.1, 9/15/97 [A. Appel]   Fixed bugs 6 (character codes > 127) and 10 (deprecated String constructor).  Version 1.2, 5/5/97, [Elliot Berk]  Notes:    Simply changed the name from JavaLex to JLex.  No other changes.  Version 1.1.5, 2/25/97, [Elliot Berk]  Notes:    Simple optimization to the creation of the source files.     Added a BufferedOutputStream in the creation of the DataOutputStream     field m_outstream of the class CLexGen.  This helps performance by     doing some buffering, and was suggested by Max Hailperin,     Associate Professor of Computer Science, Gustavus Adolphus College.  Version 1.1.4, 12/12/96, [Elliot Berk]  Notes:    Added %public directive to make generated class public.  Version 1.1.3, 12/11/96, [Elliot Berk]  Notes:    Converted Jassertion failure on invalid character class      when a dash '-' is not preceded with a start-of-range character.     Converted this into parse error E_DASH.  Version 1.1.2, October 30, 1996 [Elliot Berk]    Fixed BitSet bugs by installing a BitSet class of my own,     called JavaLexBitSet.  Fixed support for '\r', non-UNIX      sequences.  Added try/catch block around lexer generation     in main routine to moderate error information presented      to user.  Fixed macro expansion, so that macros following      quotes are expanded correctly in regular expressions.     Fixed dynamic reallocation of accept action buffers.  Version 1.1.1, September 3, 1996 [Andrew Appel]    Made the class "Main" instead of "JavaLex",      improved the installation instructions to reflect this.  Version 1.1, August 15, 1996  [Andrew Appel]    Made yychar, yyline, yytext global to the lexer so that     auxiliary functions can access them.  **************************************************************//***************************************************************       JLEX COPYRIGHT NOTICE, LICENSE, AND DISCLAIMER  Copyright 1996-2000 by Elliot Joel Berk and C. Scott Ananian   Permission to use, copy, modify, and distribute this software and its  documentation for any purpose and without fee is hereby granted,  provided that the above copyright notice appear in all copies and that  both the copyright notice and this permission notice and warranty  disclaimer appear in supporting documentation, and that the name of  the authors or their employers not be used in advertising or publicity  pertaining to distribution of the software without specific, written  prior permission.  The authors and their employers disclaim all warranties with regard to  this software, including all implied warranties of merchantability and  fitness. In no event shall the authors or their employers be liable  for any special, indirect or consequential damages or any damages  whatsoever resulting from loss of use, data or profits, whether in an  action of contract, negligence or other tortious action, arising out  of or in connection with the use or performance of this software.  **************************************************************//***************************************************************  Package Declaration  **************************************************************/package JLex;/***************************************************************  Imported Packages  **************************************************************/import java.lang.System;import java.lang.Integer;import java.lang.Character;import java.util.Enumeration;import java.util.Stack;import java.util.Hashtable;import java.util.Vector;/******************************  Questions:  2) How should I use the Java package system  to make my tool more modularized and  coherent?  Unimplemented:  !) Fix BitSet issues -- expand only when necessary.  2) Repeated accept rules.  6) Clean up the CAlloc class and use buffered  allocation.  9) Add to spec about extending character set.  11) m_verbose -- what should be done with it?  12) turn lexical analyzer into a coherent  Java package  13) turn lexical analyzer generator into a  coherent Java package  16) pretty up generated code  17) make it possible to have white space in  regular expressions  18) clean up all of the class files the lexer  generator produces when it is compiled,  and reduce this number in some way.  24) character format to and from file: writeup  and implementation  25) Debug by testing all arcane regular expression cases.  26) Look for and fix all UNDONE comments below.  27) Fix package system.  28) Clean up unnecessary classes.  *****************************//***************************************************************  Class: CSpec **************************************************************/class CSpec{  /***************************************************************    Member Variables    **************************************************************/      /* Lexical States. */  Hashtable m_states; /* Hashtable taking state indices (Integer) 			 to state name (String). */  /* Regular Expression Macros. */   Hashtable m_macros; /* Hashtable taking macro name (String)				to corresponding char buffer that				holds macro definition. */  /* NFA Machine. */  CNfa m_nfa_start; /* Start state of NFA machine. */  Vector m_nfa_states; /* Vector of states, with index				 corresponding to label. */    Vector m_state_rules[]; /* An array of Vectors of Integers.				    The ith Vector represents the lexical state				    with index i.  The contents of the ith 				    Vector are the indices of the NFA start				    states that can be matched while in				    the ith lexical state. */				      int m_state_dtrans[];  /* DFA Machine. */  Vector m_dfa_states; /* Vector of states, with index				 corresponding to label. */  Hashtable m_dfa_sets; /* Hashtable taking set of NFA states				  to corresponding DFA state, 				  if the latter exists. */    /* Accept States and Corresponding Anchors. */  Vector m_accept_vector;  int m_anchor_array[];  /* Transition Table. */  Vector m_dtrans_vector;  int m_dtrans_ncols;  int m_row_map[];  int m_col_map[];  /* Special pseudo-characters for beginning-of-line and end-of-file. */  static final int NUM_PSEUDO=2;  int BOL; // beginning-of-line  int EOF; // end-of-line  /** NFA character class minimization map. */  int m_ccls_map[];  /* Regular expression token variables. */  int m_current_token;  char m_lexeme;  boolean m_in_quote;  boolean m_in_ccl;  /* Verbose execution flag. */  boolean m_verbose;  /* JLex directives flags. */  boolean m_integer_type;  boolean m_intwrap_type;  boolean m_yyeof;  boolean m_count_chars;  boolean m_count_lines;  boolean m_cup_compatible;  boolean m_unix;  boolean m_public;  boolean m_ignorecase;  char m_init_code[];  int m_init_read;  char m_init_throw_code[];  int m_init_throw_read;  char m_class_code[];  int m_class_read;  char m_eof_code[];  int m_eof_read;  char m_eof_value_code[];  int m_eof_value_read;  char m_eof_throw_code[];  int m_eof_throw_read;  char m_yylex_throw_code[];  int m_yylex_throw_read;  /* Class, function, type names. */  char m_class_name[] = {              'Y', 'y', 'l',     'e', 'x'     };  char m_implements_name[] = {};  char m_function_name[] = {    'y', 'y', 'l',     'e', 'x'     };  char m_type_name[] = {    'Y', 'y', 't',     'o', 'k', 'e',    'n'    };  /* Lexical Generator. */  private CLexGen m_lexGen;  /***************************************************************    Constants    ***********************************************************/  static final int NONE = 0;  static final int START = 1;  static final int END = 2;    /***************************************************************    Function: CSpec    Description: Constructor.    **************************************************************/  CSpec    (     CLexGen lexGen     )      {	m_lexGen = lexGen;	/* Initialize regular expression token variables. */	m_current_token = m_lexGen.EOS;	m_lexeme = '\0';	m_in_quote = false;	m_in_ccl = false;	/* Initialize hashtable for lexer states. */	m_states = new Hashtable();	m_states.put(new String("YYINITIAL"),new Integer(m_states.size()));	/* Initialize hashtable for lexical macros. */	m_macros = new Hashtable();	/* Initialize variables for lexer options. */	m_integer_type = false;	m_intwrap_type = false;	m_count_lines = false;	m_count_chars = false;	m_cup_compatible = false;	m_unix = true;        m_public = false;	m_yyeof = false;	m_ignorecase = false;	/* Initialize variables for JLex runtime options. */	m_verbose = true;	m_nfa_start = null;	m_nfa_states = new Vector();		m_dfa_states = new Vector();	m_dfa_sets = new Hashtable();	m_dtrans_vector = new Vector();	m_dtrans_ncols = CUtility.MAX_SEVEN_BIT + 1;	m_row_map = null;	m_col_map = null;	m_accept_vector = null;	m_anchor_array = null;	m_init_code = null;	m_init_read = 0;	m_init_throw_code = null;	m_init_throw_read = 0;	m_yylex_throw_code = null;	m_yylex_throw_read = 0;	m_class_code = null;	m_class_read = 0;	m_eof_code = null;	m_eof_read = 0;	m_eof_value_code = null;	m_eof_value_read = 0;	m_eof_throw_code = null;	m_eof_throw_read = 0;	m_state_dtrans = null;	m_state_rules = null;      }}/***************************************************************  Class: CEmit  **************************************************************/class CEmit
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -