📄 freegrm5.txt
字号:
recall times during the development effort when the number of
conflicts was well in excess of 200). The most recent addition of
nested types probably took about 2 weeks to implement. On the other
hand, I probably spent several months developing the automated
documentation tools that allowed me to debug the grammar additions
this quickly.
Towards the end of the initial development of the C++ grammar, which
took roughly 3 months of my time (circa summer 1989), I began to see
the folly in part of my quest. I came to the conclusion that further
attempts to modify my grammar, so as to defer resolutions of
ambiguities, would lead to an unreadable language. Specifically, my
feeling was that I was entering into a battle of wits with the
compiler, and the compiler was starting to win. It was beginning to
be the case that the parser COULD figure out what I said, but I
couldn't. Indeed, even examples in a version of the C++ 2.0 reference
manual (and published in the ARM) demonstrated this problem (my parser
could parse some examples that neither I nor the authors parsed
correctly!). At this point I decided to stop my quest to FURTHER
defer resolutions of ambiguities, and let the grammar commit in one
direction (always in favor of declarations), at the late point that is
provided by my grammar. If this direction proved "incorrect in light
of the context that followed", then I generated a syntax error. I
believe this strategy provides ample room for expressiveness. In
support of this expressiveness, I have (based on my discussions with
language experts) deferred disambiguation far longer than other
attempts at producing an LR(1) grammar. I would strongly argue that
any code that my grammar identifies as having a "syntax error" (based
on "premature" disambiguation), but cfront allows, should ABSOLUTELY
be rewritten in a less ambiguous (and hence more portable) form.
It should be noted that my grammar cannot be in constant agreement
with such implementations as cfront because a) my grammar is
internally consistent (mostly courtesy of its formal nature and YACC
verification), and b) YACC generated parsers don't dump core. (I will
probably take a lot of flack for that last snipe, but.... every time I
have had difficulty figuring what was meant syntactically by some
construct that the ARM was vague about, and I fed it to cfront, cfront
dumped core.)
One major motivation for using the C++ grammar that I have provided is
that it is capable of supporting old style function definitions (e.g.:
main(argc, argv) int argc; char*argv[]; {...} ). I believe this
capability was removed from the C++ specification in order to reduce
ambiguities in a specific implementation (cfront). As my grammar
demonstrates, this action was not necessary. Supporting old style
function definition GREATLY eases the transition to the use of C++
from traditional C. I expect that as some parsers begin to support
this option, that other parsing engines will be forced in this
direction by a competitive marketplace. Using my grammar, and the
standards it implies, appears to be a very straightforward approach to
this support.
A second motivation for using my grammar is that it can be processed
by YACC. The advantage in this fact lies with YACC's capability to
identify ambiguities. For software manufacturers that are heavily
concerned with correctness, this is an INCREDIBLE advantage. My
experience with hand written parsers (which usually represent a
translation by a fallible human from a grammar to parsing code) is
that they evolve and become more correct with time. Ambiguous cases
are often misparsed, without the author ever realizing there was a
conflict! In contrast, either a YACC grammar supports a given
construct, or it doesn't. If a YACC grammar supports a construct, the
semantic interpretation is usually rather straight forward. The
likelihood of internal errors in the parser is therefore SIGNIFICANTLY
reduced. The fact the the grammars I supplied are free of %assoc and
%prec operators, implies the grammar are fairly portable, and the
conflicts are open to peer code review (and not obscured).
Most recently I have joined the ANSI C++ committee (X3J16), and have
tried to follow their progress with hopes of maintaining compliance in
my grammar. Unfortunately, political pressures within X3J16 appear to
make it IMHO more desirable to quickly approve a standard that matches
cfront's performance (when it is not dumping core), than to provide a
clean, consistent and formal syntax as part of the standard. Rather
than fixing inconsistent hackery within the syntax, there is IMHO a
tendency to want to "hack further" to match cfront's current
performance (or the ARM's prophesy). As an example of this, the
fundamental hack in all of C is the feedback from the parser to the
lexer to identify typedefnames. There is discussion afoot to (for no
reason other than compliance with a *proposed* cfront feature) extend
this another notch and require feedback to distinguish template names.
This hackery was not required by the syntax, rather it was "desirable"
to match the performance of beta-cfront (and the ARM). When cfront
changes, and old code is obsoleted, the arguments abound that it is
for the good of humanity. When cfront is hacking inconsistently, then
no change can be made, because of the thousands of lines of code that
depend on this psuedo-standard. Perhaps my grammar will help in some
small way Microsoft, Zortech, Borland, and dozens of other
entrepreneurs work toward building a standard for a language that has
enough consistency to grow and flourish (note that none of the above
vendors use my grammar in their products, but I think they would all
share my desire for a cleaner syntax). If I am successful with my
grammar, then I will be able to write C++ tools in a consistent and
open marketplace. From my perspective, the outcome is not clear. If
you have a channel to support the use of a cleaner syntax in the X3J16
standard, I would heartily invite you to exercise that channel.
As it currently stands, my grammar teeters on the edge of being
unusable due to its size. The size in turn is due to the variety of
special cases that must be dealt with within C++ parsing. With only a
few more inconsistent additions to the "standard language", my grammar
will surely become completely unusable. I am trying to develop a yacc
preprocessor that will allow me to rein back in the complexity of the
grammar. If I can do this, then it will continue to be possible to
update my grammar to match the emerging ANSI Standard. I can only
promise to try.
FEEDBACK ABOUT THE GRAMMARS
If you find any errors in my grammars, I would be DELIGHTED to hear
mention of them!!!! These should fall into one of the following
categories:
1) The grammar left out the following features of C++...
or
2) The grammar mis-parses the following sequences...
or
3) The discussion of the following ambiguity is
incorrect...
4) The grammar could be simplified as follows...
Please send correspondence of this form to jar@hq.ileaf.com. My
response to 1's will be to add the feature (if possible!); feel sad
that I made a mistake; and feel glad that YOU found it. I will have a
similar response to 2's. Responses of type 3 are GREAT, but I haven't
found many folks that really get into YACC ambiguities, so I have low
expectations... feel free to surprise me!!! :-) :-). Items of type 3
are interesting, but since simplicity is in the eye of the beholder,
such suggestions are subject to debate. I would be interested in
seeing suggestions in this area with the constraint that they do not
increase the number of conflicts in the grammar! Please use YACC to
check your suggestions before submitting them. (It is often AMAZING
how the slightest change in the grammar can change the number of
conflicts, and it took a great deal of work to reach the current
state). Distribution site(s) will be set up to distribute updates and
or corrections. Postings about the presence of corrections will be
made on the net.
Since the two grammars (C and C++) were generated in parallel, you
should be able to compare non-terminals directly. This will hopefully
make it easier to identify the complexities involved with the C++
grammar, and "ignore" those that result from standard ANSI C. In both
cases I have left the standard if-if-else ambiguity intact. In the
case of ANSI C grammar, this is the only shift-reduce conflict in the
grammar. Although there are a number of conflicts in the C++ grammar,
there are actually very few classes of problems. In order to
disambiguate the C++ grammar enough that YACC can figure out what to
do, I was commonly forced to "inline expand" non-terminals found in
the C grammar. This expansion allowed YACC to defer disambiguation
until it was possible for an LR(1) parser to understand the context.
The unfortunate consequence of this inline expansion is a large growth
in the number of rules, and the presence of an effective "multiplier"
in most cases where conflicts do occur. As a result, any conflicts
that arise are multiplied by a factor corresponding to the number of
rules I had to list separately. I have grouped the C++ grammar
conflicts in the "Status" section of the GRAMMAR5.TXT paper, but you
are welcome to explore my grammars using YACC directly (be warned that
you will need a robust version of YACC to handle the C++ sized
grammar). PLEASE do not be put off by the number of conflicts in the
C++ grammar. There are VERY FEW CONFLICTS, but my elaborated grammar
confuses the count.
The GRAMMAR5.TXT paper is FAR from a publishable quality paper, but it
discusses many of the issues involved in ambiguities in my grammar,
and more generally in the C++ language. I hope GRAMMAR5.TXT it is a
vast improvement over "nothing at all", but apologize in advance for
lack of polished structure, and the presence of many typos (which must
surely be present). I hope you find this almost-paper interesting. My
attempts at documenting conflicts have certainly clarified the
problems in my mind. Based on my experience with the conflicts I have
identified, most current compilers and translator fall prey to the
situations that I have uncovered. I hope that other compilers, that
do not make use of the grammar I have made available, will at least
seek to standardize the resolution of the problems identified.
The AUTODOC5.TXT file provides interesting reading for both readers
interested in LR and LALR parsing (and the subtle connections and
distinctions between them), as well as any user that wishes to fully
comprehend the contents of the Y.OUTPUT file. It includes an
extensive discussion of ambiguities, how they are removed, how
LALR-only ambiguities arise, and how they can be dealt with.
With this release of the grammar I have begun to distribute machine
generated documentation for my grammar. As a result, if my analysis
of conflicts are questionable, the supporting data is immediately
present to confirm or deny my analysis. If you wish to correct any of
my analysis, please use and refer to the Y.OUTPUT file that I have
provided.
As a small commercial message... I am a freelance consultant, and I
travel far and wide to perform contracts. If you like the work that I
am presenting in this group of documents, and would like to see a
resume or at least talk, please feel free to contact me.
I hope that the grammars that I have provided, will lead to many
successful C++ processing projects.
Jim Roskind
Independent Consultant
516 Latania Palm Drive
Indialantic FL 32903
(407)729-4348
jar@hq.ileaf.com or ...!uunet!leafusa!jar
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -