changes_from_133_before_mr13.txt
来自「EFI BIOS是Intel提出的下一代的BIOS标准。这里上传的Edk源代码是」· 文本 代码 · 共 1,543 行 · 第 1/5 页
TXT
1,543 行
------------------------------------------------------------
This is the second part of a two part file.
This is a list of changes to pccts 1.33 prior to MR13
For more recent information see CHANGES_FROM_133.txt
------------------------------------------------------------
DISCLAIMER
The software and these notes are provided "as is". They may include
typographical or technical errors and their authors disclaims all
liability of any kind or nature for damages due to error, fault,
defect, or deficiency regardless of cause. All warranties of any
kind, either express or implied, including, but not limited to, the
implied warranties of merchantability and fitness for a particular
purpose are disclaimed.
#153. (Changed in MR12b) Bug in computation of -mrhoist suppression set
Consider the following grammar with k=1 and "-mrhoist on":
r1 : (A)? => ((p>>? x /* l1 */
| r2 /* l2 */
;
r2 : A /* l4 */
| (B)? => <<q>>? y /* l5 */
;
In earlier versions the mrhoist routine would see that both l1 and
l2 contained predicates and would assume that this prevented either
from acting to suppress the other predicate. In the example above
it didn't realize the A at line l4 is capable of suppressing the
predicate at l1 even though alt l2 contains (indirectly) a predicate.
This is fixed in MR12b.
Reported by Reinier van den Born (reinier@vnet.ibm.com)
#153. (Changed in MR12a) Bug in computation of -mrhoist suppression set
An oversight similar to that described in Item #152 appeared in
the computation of the set that "covered" a predicate. If a
predicate expression included a term such as p=AND(q,r) the context
of p was taken to be context(q) & context(r), when it should have
been context(q) | context(r). This is fixed in MR12a.
#152. (Changed in MR12) Bug in generation of predicate expressions
The primary purpose for MR12 is to make quite clear that MR11 is
obsolete and to fix the bug related to predicate expressions.
In MR10 code was added to optimize the code generated for
predicate expression tests. Unfortunately, there was a
significant oversight in the code which resulted in a bug in
the generation of code for predicate expression tests which
contained predicates combined using AND:
r0 : (r1)* "@" ;
r1 : (AAA)? => <<p LATEXT(1)>>? r2 ;
r2 : (BBB)? => <<q LATEXT(1)>>? Q
| (BBB)? => <<r LATEXT(1)>>? Q
;
In MR11 (and MR10 when using "-mrhoist on") the code generated
for r0 to predict r1 would be equivalent to:
if ( LA(1)==Q &&
(LA(1)==AAA && LA(1)==BBB) &&
( p && ( q || r )) ) {
This is incorrect because it expresses the idea that LA(1)
*must* be AAA in order to attempt r1, and *must* be BBB to
attempt r2. The result was that r1 became unreachable since
both condition can not be simultaneously true.
The general philosophy of code generation for predicates
can be summarized as follows:
a. If the context is true don't enter an alt
for which the corresponding predicate is false.
If the context is false then it is okay to enter
the alt without evaluating the predicate at all.
b. A predicate created by ORing of predicates has
context which is the OR of their individual contexts.
c. A predicate created by ANDing of predicates has
(surprise) context which is the OR of their individual
contexts.
d. Apply these rules recursively.
e. Remember rule (a)
The correct code should express the idea that *if* LA(1) is
AAA then p must be true to attempt r1, but if LA(1) is *not*
AAA then it is okay to attempt r1, provided that *if* LA(1) is
BBB then one of q or r must be true.
if ( LA(1)==Q &&
( !(LA(1)==AAA || LA(1)==BBB) ||
( ! LA(1) == AAA || p) &&
( ! LA(1) == BBB || q || r ) ) ) {
I believe this is fixed in MR12.
Reported by Reinier van den Born (reinier@vnet.ibm.com)
#151a. (Changed in MR12) ANTLRParser::getLexer()
As a result of several requests, I have added public methods to
get a pointer to the lexer belonging to a parser.
ANTLRTokenStream *ANTLRParser::getLexer() const
Returns a pointer to the lexer being used by the
parser. ANTLRTokenStream is the base class of
DLGLexer
ANTLRTokenStream *ANTLRTokenBuffer::getLexer() const
Returns a pointer to the lexer being used by the
ANTLRTokenBuffer. ANTLRTokenStream is the base
class of DLGLexer
You must manually cast the ANTLRTokenStream to your program's
lexer class. Because the name of the lexer's class is not fixed.
Thus it is impossible to incorporate it into the DLGLexerBase
class.
#151b.(Changed in MR12) ParserBlackBox member getLexer()
The template class ParserBlackBox now has a member getLexer()
which returns a pointer to the lexer.
#150. (Changed in MR12) syntaxErrCount and lexErrCount now public
See Item #127 for more information.
#149. (Changed in MR12) antlr option -info o (letter o for orphan)
If there is more than one rule which is not referenced by any
other rule then all such rules are listed. This is useful for
alerting one to rules which are not used, but which can still
contribute to ambiguity. For example:
start : a Z ;
unused: a A ;
a : (A)+ ;
will cause an ambiguity report for rule "a" which will be
difficult to understand if the user forgets about rule "unused"
simply because it is not used in the grammar.
#148. (Changed in MR11) #token names appearing in zztokens,token_tbl
In a #token statement like the following:
#token Plus "\+"
the string "Plus" appears in the zztokens array (C mode) and
token_tbl (C++ mode). This string is used in most error
messages. In MR11 one has the option of using some other string,
(e.g. "+") in those tables.
In MR11 one can write:
#token Plus ("+") "\+"
#token RP ("(") "\("
#token COM ("comment begin") "/\*"
A #token statement is allowed to appear in more than one #lexclass
with different regular expressions. However, the token name appears
only once in the zztokens/token_tbl array. This means that only
one substitute can be specified for a given #token name. The second
attempt to define a substitute name (different from the first) will
result in an error message.
#147. (Changed in MR11) Bug in follow set computation
There is a bug in 1.33 vanilla and all maintenance releases
prior to MR11 in the computation of the follow set. The bug is
different than that described in Item #82 and probably more
common. It was discovered in the ansi.g grammar while testing
the "ambiguity aid" (Item #119). The search for a bug started
when the ambiguity aid was unable to discover the actual source
of an ambiguity reported by antlr.
The problem appears when an optimization of the follow set
computation is used inappropriately. The result is that the
follow set used is the "worst case". In other words, the error
can lead to false reports of ambiguity. The good news is that
if you have a grammar in which you have addressed all reported
ambiguities you are ok. The bad news is that you may have spent
time fixing ambiguities that were not real, or used k=2 when
ck=2 might have been sufficient, and so on.
The following grammar demonstrates the problem:
------------------------------------------------------------
expr : ID ;
start : stmt SEMI ;
stmt : CASE expr COLON
| expr SEMI
| plain_stmt
;
plain_stmt : ID COLON ;
------------------------------------------------------------
When compiled with k=1 and ck=2 it will report:
warning: alts 2 and 3 of the rule itself ambiguous upon
{ IDENTIFIER }, { COLON }
When antlr analyzes "stmt" it computes the first[1] set of all
alternatives. It finds an ambiguity between alts 2 and 3 for ID.
It then computes the first[2] set for alternatives 2 and 3 to resolve
the ambiguity. In computing the first[2] set of "expr" (which is
only one token long) it needs to determine what could follow "expr".
Under a certain combination of circumstances antlr forgets that it
is trying to analyze "stmt" which can only be followed by SEMI and
adds to the first[2] set of "expr" the "global" follow set (including
"COLON") which could follow "expr" (under other conditions) in the
phrase "CASE expr COLON".
#146. (Changed in MR11) Option -treport for locating "difficult" alts
It can be difficult to determine which alternatives are causing
pccts to work hard to resolve an ambiguity. In some cases the
ambiguity is successfully resolved after much CPU time so there
is no message at all.
A rough measure of the amount of work being peformed which is
independent of the CPU speed and system load is the number of
tnodes created. Using "-info t" gives information about the
total number of tnodes created and the peak number of tnodes.
Tree Nodes: peak 1300k created 1416k lost 0
It also puts in the generated C or C++ file the number of tnodes
created for a rule (at the end of the rule). However this
information is not sufficient to locate the alternatives within
a rule which are causing the creation of tnodes.
Using:
antlr -treport 100000 ....
causes antlr to list on stdout any alternatives which require the
creation of more than 100,000 tnodes, along with the lookahead sets
for those alternatives.
The following is a trivial case from the ansi.g grammar which shows
the format of the report. This report might be of more interest
in cases where 1,000,000 tuples were created to resolve the ambiguity.
-------------------------------------------------------------------------
There were 0 tuples whose ambiguity could not be resolved
by full lookahead
There were 157 tnodes created to resolve ambiguity between:
Choice 1: statement/2 line 475 file ansi.g
Choice 2: statement/3 line 476 file ansi.g
Intersection of lookahead[1] sets:
IDENTIFIER
Intersection of lookahead[2] sets:
LPARENTHESIS COLON AMPERSAND MINUS
STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT
NOT SIZEOF OCTALINT DECIMALINT
HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER
STRING CHARACTER
-------------------------------------------------------------------------
#145. (Documentation) Generation of Expression Trees
Item #99 was misleading because it implied that the optimization
for tree expressions was available only for trees created by
predicate expressions and neglected to mention that it required
the use of "-mrhoist on". The optimization applies to tree
expressions created for grammars with k>1 and for predicates with
lookahead depth >1.
In MR11 the optimized version is always used so the -mrhoist on
option need not be specified.
#144. (Changed in MR11) Incorrect test for exception group
In testing for a rule's exception group the label a pointer
is compared against '\0'. The intention is "*pointer".
Reported by Jeffrey C. Fried (Jeff@Fried.net).
#143. (Changed in MR11) Optional ";" at end of #token statement
Fixes problem of:
#token X "x"
<<
parser action
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?