📄 freegrm5.txt
字号:
FILENAME: FREEGRAM5.TXT
AUTHOR: Jim Roskind
Independent Consultant
516 Latania Palm Drive
Indialantic FL 32903
(407)729-4348
jar@hq.ileaf.com
or ...uunet!leafusa!jar
7/4/91
Dear C++ and C Grammar User,
I have written a YACC debugging tool, and a set of grammars for C and
C++ in order to use them within my own personal project development. I
have made the results of my work in this area available to other
developers at no charge with the hope that they would use my work. I
believe the entire C++ community can benefit from such
standardization. If any of the copyright notices on the grammars
(which are VERY liberal) prevent using my work, please notify me of
the problem.
Note that the grammars can each be processed by YACC, but they are
very clean, and make NO USE of the precedence setting (i.e.: %prec) or
associativity setting (i.e.:%assoc) constructs of YACC. This feature
should make them easily portable to other parser generator input
format. This "cleanliness" fact also provides brutal exposure of all
the complex constructs in C++, and the complexity of the grammar as a
whole (the C++ grammar is 2 to 3 times as large as the C grammar)
reflects this exposure.
The files included in this set are:
FREEGRM5.TXT This introductory file
GRAMMAR5.TXT Parsing ambiguities in C++, and in my grammar
CPP5.Y My YACC compatible C++ grammar
C5.Y My YACC compatible, ANSI C conformant grammar
CPP5.L Flex input file defining a C++ lexical analyzer
SKELGRPH.C A hacked file from the Berkeley YACC distribution
AUTODOC5.TXT Documentation for my machine generated analysis
Y.OUTPUT Machine generated analysis of the C++ grammar.
Aside from the addition of several files, this release of my grammar
corrects a few problems located in my prior release. I have also
transitioned to using names in my grammar that are more acceptable to
a wider variety of parser generators. This release also includes
support for nested types (at least grammatically, as there is no
symbol table provided). It does not support templates and exception
handling, as the ANSI C++ Committee is still discussing variations
(and trying to deal with a variety of ambiguities that the initial
proposals, such as what is described in the ARM, would entail).
Since my first public release of my grammar, I have received a number
of requests. One of the most common requests was for a lexical
analyzer to go with the grammar. This release of the grammar
continues to provides such a a "bare bones" lexical analyzer. The
analyzer does not support preprocessing, or even comment removal. In
addition, since I have not included a symbol table, or semantic
actions in the grammar to maintain proper context (i.e., current
scope), typedef names and struct/class/union/enum tags are not
*really* defined. To allow users to experiment with my grammar
without a symbol table, my lexer assumes that if the first letter of
the name is upper case, then then name is a type name. This hack is
far from sufficient for parsing full blown programs, but it is more
than sufficient for experimenting with the grammar to determine the
acceptability of a token sequence, and to understand how my grammar
parsed the sequence.
Since I did not believe that a lexical analyzer alone would be
sufficient to assist many people with playing with my grammar, I have
also provided the basis for a tool to explain what a grammar is doing.
Specifically, I have modified a file that is included in the Berkeley
YACC distribution so that parsers generated by such a YACC would
automatically display a syntax tree in graphical-ASCII format during a
parse. The instructions for using and building this yacc tool are
presented in the next section. Note that there are no significant
special hooks in my grammar or parser to excite this yacc tool, and
the tool can be used equally well on any grammar that you are working
with. This graphical debugging tool is probably one of the most
popular aspects of my releases, and its presence and usefulness to
grammar developers should not be underestimated.
Significantly new to this release is a large file that contains
machine generated documentation (re: Y.OUTPUT). This file goes well
beyond what is provided in a typical verbose output, and provides both
detailed conflict analysis, and a number of cross-references which
make it **MUCH** easier to read the associated grammar. I have not
yet decided whether to market, shareware, or plain give-away my
program, so the best I can do at this point is to release the machine
generated documentation. Unfortunately, this file is *very* large,
and I have decided (for the time being) to distribute it only via the
ftp sites only. I am doing this to lessen the global bandwidth
utilization during my grammar posting to the network. I will however
post the file (AUTODOC5.TXT) which documents the contents of the
Y.OUTPUT file, so that users can decide if they want to download the
larger file. To hint at what is included in the automatic
documentation, the following are the sections:
Reference Grammar
Alphabetized Grammar
Sample Expansions for the Non-terminal Tokens
Summary Descriptions of the Terminal Tokens
Symbol and Grammar Cross Reference for the Tokens
Sample Stack Context and Accessing Sentences for each State
Concise list of Conflicts
Canonical Description of Conflicts
Verbose listing of state transitions
Explanations for all reductions suggested in conflicts
Please see AUTODOC5.TXT for more details.
I have posted 7 of the 8 files to comp.lang.c++ (I will not post
Y.OUTPUT due to its size), to make this information as available as
possible to users and developers. I will also post this introductory
note to comp.compilers, and comp.lang.c. I am arranging for archival
support via several ftp sites, and updates will be posted to those
sites. I will also try to get the source to Berkeley YACC posted to
these ftp sites, although it is certainly available at more central
sites.
Currently, Doug Lea and Doug Schmidtt have graciously offered to
provide anonymous ftp sites for all 8 of files, as well as the
Berkeley YACC source (if you need it). The ftp addresses are:
ics.uci.edu (128.195.1.1) in the ftp/pub directory as:
c++grammar2.0.tar.Z
byacc1.8.tar.Z
mach1.npac.syr.edu (128.230.7.14) in the ftp/pub/C++ directory as:
c++grammar2.0.tar.Z
byacc1.8.tar.z
HOW TO EXPERIMENT WITH THE C++ GRAMMAR
The following describes how to use the graphical debugging extensions
to Berkeley YACC to explore the grammar.
Note that the following instructions assume that you have the Berkeley
YACC source on hand. You can actually use any YACC to process the
grammar, but running the resulting demo (which has no semantic
actions) will tend to be quite boring. If you can't get hold of the
Berkeley YACC, you should at least try to enable the "debugging
options" in your parser to so that you can see in some way what
reductions are taking place. (Hint: search for the letters "debug" in
the C file that your yacc produces...).
1) Get the entire source for Berkeley YACC 1.8 1/2/91
2) Verify that you can make the YACC
3) Rename SKELETON.C to SKELOLD.C, and implant my SKELGRPH.C
in that directory as SKELETON.C
4) Make the yacc using this new SKELETON.C
5) Using the above yacc, process my CPP5.Y file
yacc -dvl cpp5.y
The result should be a file y.tab.c, and y.tab.h
6) Using Flex (replacement for lex) to process my CPP5.L file
flex cpp5.l
the result should be yy.lex.c
7) Compile the two files
cc -o cpp5 y.tab.c yy.lex.c
the result should be an executable called cpp5
8) Set the environment variable YYDEBUG to 6
setenv YYDEBUG 6
If you don't do this, the graphical output will not appear!
9) Run the program cpp5
cpp5
10) Try the input:
int a;
11) You should see a nice parse tree. Enjoy. Note that
the lexer DOES NOT INCLUDE A SYMBOL TABLE, and does
NOT KEEP TRACK OF CURRENT SCOPES. The hack (see the
CPP5.L file for details) is to assume that any identifier
that begins with a capital letter is a typedef name
Send complaints about code that doesn't parse "correctly".
HISTORICAL NOTES
Developing the C grammar (that is intended to be compatible with the
ANSI C standard) was relatively straight forward (compared to the C++
grammar). The one difficulty in this process was the desire to avoid
use of %prec and %assoc constructs in YACC, which would tend to
obscure ambiguities. Since I didn't know what ambiguities were lying
in wait in C++, obscuring ambiguities was unacceptable. It took
several weeks to remove the conflicts that typically appear, and the
tedious process exposed several ambiguities that are not currently
disambiguated by the ANSI standard. The quality of the C grammar is
(IMHO) dramatically higher than what has been made available within
the public domain. Specifically, a C grammar's support of
redefinition of typedef names within inner scopes (the most difficult
area of the grammar) is typically excluded from public domain grammar,
and even excluded from most grammars that are supplied commercially
with parser generators! I expect that this grammar will be very
useful in the development of C related tools.
The development of the C++ grammar (initially compatible with version
1.2, but enhanced to support version 2.0 specifications as they were
made available) was anything but straight forward. The requirement
that I set to NOT USE %prec and %assoc proved both a blessing and a
curse. The blessing was that I could see what the problems were in
the language, the curse was that there were A LOT of conflicts (I can
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -