📄 changes_from_1.33
字号:
=======================================================================List of Implemented Fixes and Changes for Maintenance Releases of PCCTS======================================================================= DISCLAIMER The software and these notes are provided "as is". They may include typographical or technical errors and their authors disclaims all liability of any kind or nature for damages due to error, fault, defect, or deficiency regardless of cause. All warranties of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed.#197. (Changed in MR14) Resetting the lookahead buffer of the parser Explanation and fix by Sinan Karasu (sinan.karasu@boeing.com) Consider the code used to prime the lookahead buffer LA(i) of the parser when init() is called: void ANTLRParser:: prime_lookahead() { int i; for(i=1;i<=LLk; i++) consume(); dirty=0; //lap = 0; // MR14 - Sinan Karasu (sinan.karusu@boeing.com) //labase = 0; // MR14 labase=lap; // MR14 } When the parser is instantiated, lap=0,labase=0 is set. The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is computed. Therefore, lap(before the loop) == lap (after the loop). Now the only problem comes in when one does an init() of the parser after an Eof has been seen. At that time, lap could be non zero. Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2, then consume() { NLA = inputTokens->getToken()->getType(); dirty--; lap = (lap+1)&(LLk-1); } or expanding NLA, token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType(); dirty--; lap = (lap+1)&(LLk-1); so now we prime locations 1 and 2. In prime_lookahead it used to set lap=0 and labase=0. Now, the next token will be read from location 0, NOT 1 as it should have been. This was never caught before, because if a parser is just instantiated, then lap and labase are 0, the offending assignment lines are basically no-ops, since the for loop wraps around back to 0.#196. (Changed in MR14) Problems with "(alpha)? beta" guess Consider the following syntactic predicate in a grammar with 2 tokens of lookahead (k=2 or ck=2): rule : ( alpha )? beta ; alpha : S t ; t : T U | T ; beta : S t Z ; When antlr computes the prediction expression with one token of lookahead for alts 1 and 2 of rule t it finds an ambiguity. Because the grammar has a lookahead of 2 it tries to compute two tokens of lookahead for alts 1 and 2 of t. Alt 1 clearly has a lookahead of (T U). Alt 2 is one token long so antlr tries to compute the follow set of alt 2, which means finding the things which can follow rule t in the context of (alpha)?. This cannot be computed, because alpha is only part of a rule, and antlr can't tell what part of beta is matched by alpha and what part remains to be matched. Thus it impossible for antlr to properly determine the follow set of rule t. Prior to 1.33MR14, the follow of (alpha)? was computed as FIRST(beta) as a result of the internal representation of guess blocks. With MR14 the follow set will be the empty set for that context. Normally, one expects a rule appearing in a guess block to also appear elsewhere. When the follow context for this other use is "ored" with the empty set, the context from the other use results, and a reasonable follow context results. However if there is *no* other use of the rule, or it is used in a different manner then the follow context will be inaccurate - it was inaccurate even before MR14, but it will be inaccurate in a different way. For the example given earlier, a reasonable way to rewrite the grammar: rule : ( alpha )? beta alpha : S t ; t : T U | T ; beta : alpha Z ; If there are no other uses of the rule appearing in the guess block it will generate a test for EOF - a workaround for representing a null set in the lookahead tests. If you encounter such a problem you can use the -alpha option to get additional information: line 2: error: not possible to compute follow set for alpha in an "(alpha)? beta" block. With the antlr -alpha command line option the following information is inserted into the generated file: #if 0 Trace of references leading to attempt to compute the follow set of alpha in an "(alpha)? beta" block. It is not possible for antlr to compute this follow set because it is not known what part of beta has already been matched by alpha and what part remains to be matched. Rules which make use of the incorrect follow set will also be incorrect 1 #token T alpha/2 line 7 brief.g 2 end alpha alpha/3 line 8 brief.g 2 end (...)? block at start/1 line 2 brief.g #endif At the moment, with the -alpha option selected the program marks any rules which appear in the trace back chain (above) as rules with possible problems computing follow set. Reported by Greg Knapen (gregory.knapen@bell.ca).#195. (Changed in MR14) #line directive not at column 1 Under certain circunstances a predicate test could generate a #line directive which was not at column 1. Reported with fix by David K錱edal (davidk@lysator.liu.se) (http://www.lysator.liu.se/~davidk/).#194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass In C mode with the demand lookahead option there is a bug in the code which handles matches for #tokclass (zzsetmatch and zzsetmatch_wsig). The bug causes the lookahead pointer to get out of synchronization with the current token pointer. The problem was reported with a fix by Ger Hobbelt (hobbelt@axa.nl).#193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD The pcctscfg.h now contains the following definitions: #ifdef PCCTS_USE_NAMESPACE_STD #define PCCTS_STDIO_H <Cstdio> #define PCCTS_STDLIB_H <Cstdlib> #define PCCTS_STDARG_H <Cstdarg> #define PCCTS_SETJMP_H <Csetjmp> #define PCCTS_STRING_H <Cstring> #define PCCTS_ASSERT_H <Cassert> #define PCCTS_ISTREAM_H <istream> #define PCCTS_IOSTREAM_H <iostream> #define PCCTS_NAMESPACE_STD namespace std {}; using namespace std; #else #define PCCTS_STDIO_H <stdio.h> #define PCCTS_STDLIB_H <stdlib.h> #define PCCTS_STDARG_H <stdarg.h> #define PCCTS_SETJMP_H <setjmp.h> #define PCCTS_STRING_H <string.h> #define PCCTS_ASSERT_H <assert.h> #define PCCTS_ISTREAM_H <istream.h> #define PCCTS_IOSTREAM_H <iostream.h> #define PCCTS_NAMESPACE_STD #endif The runtime support in pccts/h uses these pre-processor symbols consistently. Also, antlr and dlg have been changed to generate code which uses these pre-processor symbols rather than having the names of the #include files hard-coded in the generated code. This required the addition of "#include pcctscfg.h" to a number of files in pccts/h. It appears that this sometimes causes problems for MSVC 5 in combination with the "automatic" option for pre-compiled headers. In such cases disable the "automatic" pre-compiled headers option. Suggested by Hubert Holin (Hubert.Holin@Bigfoot.com).#192. (Changed in MR14) Change setText() to accept "const ANTLRChar *" Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *). This allows literal strings to be used to initialize tokens. Since the usual token implementation (ANTLRCommonToken) makes a copy of the input string, this was an unnecessary limitation. Suggested by Bob McWhirter (bob@netwrench.com).#191. (Changed in MR14) HP/UX aCC compiler compatibility problem Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp. Reported by David Cook (dcook@bmc.com).#190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp Reported by David Cook (dcook@bmc.com).#189. (Changed in MR14) -gxt switch in C mode The -gxt switch in C mode didn't work because of incorrect initialization. Reported by Sinan Karasu (sinan@boeing.com).#188. (Changed in MR14) Added pccts/h/DLG_stream_input.h This is a DLG stream class based on C++ istreams. Contributed by Hubert Holin (Hubert.Holin@Bigfoot.com).#187. (Changed in MR14) Rename config.h to pcctscfg.h The PCCTS configuration file has been renamed from config.h to pcctscfg.h. The problem with the original name is that it led to name collisions when pccts parsers were combined with other software. All of the runtime support routines in pccts/h/* have been changed to use the new name. Existing software can continue to use pccts/h/config.h. The contents of pccts/h/config.h is now just "#include "pcctscfg.h". I don't have a record of the user who suggested this.#186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier Classes in the C++ runtime support routines are now declared: class DllExportPCCTS className .... By default, the pre-processor symbol is defined as the empty string. This if for use by MSVC++ users to create DLL classes. Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).#185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase Normally, the ASTBase class is derived from PCCTS_AST which contains functions useful to Sorcerer. If these are not necessary then the user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which will cause the ASTBase class to replace references to PCCTS_AST with references to ASTBase where necessary. The class ASTDoublyLinkedBase will contain a pure virtual function shallowCopy() that was formerly defined in class PCCTS_AST. Suggested by Bob McWhirter (bob@netwrench.com).#184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h Reported by Hubert Holin (Hubert.Holin@bigfoot.com).#183. (Changed in MR14) -f to specify file with names of grammar files In DEC/VMS it is difficult to specify very long command lines. The -f option allows one to place the names of the grammar files in a data file in order to bypass limitations of the DEC/VMS command language interpreter. Addition supplied by Bernard Giroud (b_giroud@decus.ch).#182. (Changed in MR14) Output directory option for DEC/VMS Fix some problems with the -o option under DEC/VMS. Fix supplied by Bernard Giroud (b_giroud@decus.ch).#181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar() Changed DLGStringInput to cast the character using (unsigned char) so that languages with character codes greater than 127 work without changes. Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).#180. (Added in MR14) ANTLRParser::getEofToken() Added "ANTLRToken ANTLRParser::getEofToken() const" to match the setEofToken routine. Requested by Manfred Kogler (km@cast.uni-linz.ac.at).#179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream The BufFileInput class described in Item #142 neglected to release the allocated buffer when an instance was destroyed. Reported by Manfred Kogler (km@cast.uni-linz.ac.at).#178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets In 1.33 vanilla, and all maintenance releases prior to MR14 there is a bug in the handling of guess blocks which use the "long" form: (alpha)? beta inside a (...)*, (...)+, or {...} block. This problem does *not* apply to the case where beta is omitted or when the syntactic predicate is on the leading edge of an alternative.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -