📄 main.java
字号:
/************************************************************** JLex: A Lexical Analyzer Generator for Java(TM) Written by Elliot Berk <ejberk@cs.princeton.edu>. Copyright 1996. Maintained by C. Scott Ananian <cananian@alumni.princeton.edu>. See below for copyright notice, license, and disclaimer. New releases from http://www.cs.princeton.edu/~appel/modern/java/JLex/ Version 1.2.5, 7/25/99-5/16/00, [C. Scott Ananian] Stomped on one more 8-bit character bug. Should work now (really!). Added unicode support, including unicode escape sequences. Rewrote internal JavaLexBitSet class as SparseBitSet for efficient unicoding. Added an NFA character class simplification pass for unicode efficiency. Changed byte- and stream-oriented I/O routines to use characters and java.io.Reader and java.io.Writer instead --- which means we read in unicode specifications correctly and write out a proper unicode java source file. As a happy side-effect, the output java file is written with your platform's preferred newline character(s). Rewrote CInput to fix bugs with line-counting in the specification file and "unusual behaviour" when the last line of the specification wasn't terminated with a newline. Thanks to Matt Hanna <mhanna@cs.caltech.edu> for pointing out the bug. Fixed a bug that would cause JLex not to terminate given certain input specifications. Thanks to Mark Greenstreet <mrg@cs.ubc.ca> and Frank B. Brokken <frank@suffix.icce.rug.nl> for reporting this. CUP parser integration improved according to suggestions made by David MacMahon <davidm@smartsc.com>. The %cup directive now tells JLex to generate a parser conforming to the java_cup.runtime.Scanner interface; see manual for more details. Fixed bug with null string literals ("") in regexps. Reported by Charles Fischer <fischer@cs.wisc.edu>. Rewrote start-of-line and end-of-line handling, closing active bug #5. Also fixed line-counting code, closing active bug #12. All new-line handling is now platform-independent. Used unpackFromString more extensively to allow larger cmap, etc, tables. This helps unicode support work reliably. It's also prettier now if you happen to read the source to the generated lexer. Generated lexer now accepts unicode LS (U+2028) and PS (U+2029) as line separators for strict unicode compliance; see http://www.unicode.org/unicode/reports/tr18/ Fixed bug with character constants in action strings. Reported by Andrew Appel against 1.2.5b3. Fixed bug with illegal \^C-style escape sequences. Reported by Toshiya Iwai <iwai@isdnet.co.jp> against 1.2.5b4. Fixed "newline in quoted string" error when unpaired single- or double-quotes were present in comments in the action phrase. Reported by Stephen Ostermiller <1010JLex@ostermiller.com> against 1.2.5b4. Reported by Eric Esposito <eric.esposito@unh.edu> against 1.2.4 and 1.2.5b2. Fixed "newline in quoted string" error when /* or // appeared in quoted strings in the action phrase. Reported by David Eichmann <david-eichmann@uiowa.edu> against 1.2.5b5. Fixed 'illegal constant' errors in case statements caused by Sun's JDK 1.3 more closely adhering to the Java Language Specification. Reported by a number of people, but Harold Grovesteen <hgrovesteen@home.com> was the first to direct me to a Sun bug report (4119776) which quoted the relevant section of the JLS (15.27) to convince me that the JLex construction actually was illegal. Reported against 1.2.5b6, but this bit of code has been present since the very first version of JLex (1.1.1). Version 1.2.4, 7/24/99, [C. Scott Ananian] Correct the parsing of '-' in character classes, closing active bug #1. Behaviour follows egrep: leading and trailing dashes in a character class lose their special meaning, so [-+] and [+-] do what you would expect them to. New %ignorecase directive for generating case-insensitive lexers by expanding matched character classes in a unicode-friendly way. Handle unmatched braces in quoted strings or comments within action code blocks. Fixed input lexer to allow whitespace in character classes, closing active bug #9. Whitespace in quotes had been previously fixed. Made Yylex.YYEOF and %yyeof work like the manual says they should. Version 1.2.3, 6/26/97, [Raimondas Lencevicius] Fixed the yy_nxt[][] assignment that has generated huge code exceeding 64K method size limit. Now the assignment is handled by unpacking a string encoding of integer array. To achieve that, added "private int [][] unpackFromString(int size1, int size2, String st)" function and coded the yy_nxt[][] values into a string by printing integers into a string and representing integer sequences as "value:length" pairs. Improvement: generated .java file reduced 2 times, .class file reduced 6 times for sample grammar. No 64K errors. Possible negatives: Some editors and OSs may not be able to handle the huge one-line generated string. String unpacking may be slower than direct array initialization. Version 1.2.2, 10/24/97, [Martin Dirichs] Notes: Changed yy_instream to yy_reader of type BufferedReader. This reflects the improvements in the JDK 1.1 concerning InputStreams. As a consequence, changed yy_buffer from byte[] to char[]. The lexer can now be initialized with either an InputStream or a Reader. A third, private constructor is called by the other two to execute user specified constructor code. Version 1.2.1, 9/15/97 [A. Appel] Fixed bugs 6 (character codes > 127) and 10 (deprecated String constructor). Version 1.2, 5/5/97, [Elliot Berk] Notes: Simply changed the name from JavaLex to JLex. No other changes. Version 1.1.5, 2/25/97, [Elliot Berk] Notes: Simple optimization to the creation of the source files. Added a BufferedOutputStream in the creation of the DataOutputStream field m_outstream of the class CLexGen. This helps performance by doing some buffering, and was suggested by Max Hailperin, Associate Professor of Computer Science, Gustavus Adolphus College. Version 1.1.4, 12/12/96, [Elliot Berk] Notes: Added %public directive to make generated class public. Version 1.1.3, 12/11/96, [Elliot Berk] Notes: Converted assertion failure on invalid character class when a dash '-' is not preceded with a start-of-range character. Converted this into parse error E_DASH. Version 1.1.2, October 30, 1996 [Elliot Berk] Fixed BitSet bugs by installing a BitSet class of my own, called JavaLexBitSet. Fixed support for '\r', non-UNIX sequences. Added try/catch block around lexer generation in main routine to moderate error information presented to user. Fixed macro expansion, so that macros following quotes are expanded correctly in regular expressions. Fixed dynamic reallocation of accept action buffers. Version 1.1.1, September 3, 1996 [Andrew Appel] Made the class "Main" instead of "JavaLex", improved the installation instructions to reflect this. Version 1.1, August 15, 1996 [Andrew Appel] Made yychar, yyline, yytext global to the lexer so that auxiliary functions can access them.**************************************************************//*************************************************************** JLEX COPYRIGHT NOTICE, LICENSE, AND DISCLAIMER Copyright 1996-2000 by Elliot Joel Berk and C. Scott Ananian Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation, and that the name of the authors or their employers not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. The authors and their employers disclaim all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall the authors or their employers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software.**************************************************************//*************************************************************** Package Declaration**************************************************************/package jlex;/*************************************************************** Imported Packages**************************************************************/import java.lang.System;import java.lang.Integer;import java.lang.Character;import java.util.Enumeration;import java.util.Stack;import java.util.Hashtable;import java.util.Vector;/****************************** Questions: 2) How should I use the Java package system to make my tool more modularized and coherent? Unimplemented: !) Fix BitSet issues -- expand only when necessary. 2) Repeated accept rules. 6) Clean up the CAlloc class and use buffered allocation. 9) Add to spec about extending character set. 11) m_verbose -- what should be done with it? 12) turn lexical analyzer into a coherent Java package 13) turn lexical analyzer generator into a coherent Java package 16) pretty up generated code 17) make it possible to have white space in regular expressions 18) clean up all of the class files the lexer generator produces when it is compiled, and reduce this number in some way. 24) character format to and from file: writeup and implementation 25) Debug by testing all arcane regular expression cases. 26) Look for and fix all UNDONE comments below. 27) Fix package system. 28) Clean up unnecessary classes.*****************************//*************************************************************** Class: CSpec**************************************************************/class CSpec{ /*************************************************************** Member Variables **************************************************************/ /* Lexical States. */ Hashtable m_states; /* Hashtable taking state indices (Integer) to state name (String). */ /* Regular Expression Macros. */ Hashtable m_macros; /* Hashtable taking macro name (String) to corresponding char buffer that holds macro definition. */ /* NFA Machine. */ CNfa m_nfa_start; /* Start state of NFA machine. */ Vector m_nfa_states; /* Vector of states, with index corresponding to label. */ Vector m_state_rules[]; /* An array of Vectors of Integers. The ith Vector represents the lexical state with index i. The contents of the ith Vector are the indices of the NFA start states that can be matched while in the ith lexical state. */ int m_state_dtrans[]; /* DFA Machine. */ Vector m_dfa_states; /* Vector of states, with index corresponding to label. */ Hashtable m_dfa_sets; /* Hashtable taking set of NFA states to corresponding DFA state, if the latter exists. */ /* Accept States and Corresponding Anchors. */ Vector m_accept_vector; int m_anchor_array[]; /* Transition Table. */ Vector m_dtrans_vector; int m_dtrans_ncols; int m_row_map[]; int m_col_map[]; /* Special pseudo-characters for beginning-of-line and end-of-file. */ static final int NUM_PSEUDO=2; int BOL; // beginning-of-line int EOF; // end-of-line /** NFA character class minimization map. */ int m_ccls_map[]; /* Regular expression token variables. */ int m_current_token; char m_lexeme; boolean m_in_quote; boolean m_in_ccl; /* Verbose execution flag. */ boolean m_verbose; /* JLex directives flags. */ boolean m_integer_type; boolean m_intwrap_type; boolean m_yyeof; boolean m_count_chars; boolean m_count_lines; boolean m_cup_compatible; boolean m_unix; boolean m_public; boolean m_ignorecase; char m_init_code[]; int m_init_read; char m_init_throw_code[]; int m_init_throw_read; char m_class_code[]; int m_class_read; char m_eof_code[]; int m_eof_read; char m_eof_value_code[]; int m_eof_value_read; char m_eof_throw_code[]; int m_eof_throw_read; char m_yylex_throw_code[]; int m_yylex_throw_read; /* Class, function, type names. */ char m_class_name[] = { 'Y', 'y', 'l', 'e', 'x' }; char m_implements_name[] = {}; char m_function_name[] = { 'y', 'y', 'l', 'e', 'x' }; char m_type_name[] = { 'Y', 'y', 't', 'o', 'k', 'e', 'n' }; /* Lexical Generator. */ private CLexGen m_lexGen; /*************************************************************** Constants ***********************************************************/ static final int NONE = 0; static final int START = 1; static final int END = 2; /*************************************************************** Function: CSpec Description: Constructor. **************************************************************/ CSpec ( CLexGen lexGen ) { m_lexGen = lexGen; /* Initialize regular expression token variables. */ m_current_token = m_lexGen.EOS; m_lexeme = '\0'; m_in_quote = false; m_in_ccl = false; /* Initialize hashtable for lexer states. */ m_states = new Hashtable(); m_states.put(new String("YYINITIAL"),new Integer(m_states.size())); /* Initialize hashtable for lexical macros. */ m_macros = new Hashtable(); /* Initialize variables for lexer options. */ m_integer_type = false; m_intwrap_type = false; m_count_lines = false; m_count_chars = false; m_cup_compatible = false; m_unix = true; m_public = false; m_yyeof = false; m_ignorecase = false; /* Initialize variables for JLex runtime options. */ m_verbose = true; m_nfa_start = null; m_nfa_states = new Vector(); m_dfa_states = new Vector(); m_dfa_sets = new Hashtable(); m_dtrans_vector = new Vector(); m_dtrans_ncols = CUtility.MAX_SEVEN_BIT + 1; m_row_map = null; m_col_map = null; m_accept_vector = null; m_anchor_array = null;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -