📄 myreadme3.2.txt

📁 C Preprocessor,antlr的grammar语言描述,用于学习词法分析,语法分析
💻 TXT
字号:
MyReadMe3.2.txt

Version 3.2 November 2007

Notes for C++ grammar file to generate ANTLR parser (in C++)
(Best viewed with tabs set to 4)

Contents

1. Past

2. Present
     If you experience any problems with running these programs please read these
   notes first.

3. Future


1. Past

This C++ grammar file was originally written and published in 1994 by,

Authors: Sumana Srinivasan, NeXT Inc.;            sumana_srinivasan@next.com
         Terence Parr, Parr Research Corporation; parrt@parr-research.com
         Russell Quong, Purdue University;        quong@ecn.purdue.edu

as VERSION 1.2 for use with PCCTS (The original C version of ANTLR).

In 1997-1999 it was adapted for use in a project to analyse data flow
in C programs by Lasitha Leelasena, Sue Black (sueblack@gmail.com(2007)) and 
David Wigg (wiggjd@bcs.org.uk(2007)). The generated parser was in C++ and 
all of our included statement code was in C.

In 2000, in view of the fact that ANTLR had then been re-written in Java 
and any further development of PCCTS had been suspended, it was decided
that we should convert our version of the C grammar for PCCTS into 
use with ANTLR. As all our included application code was in C it was 
decided to use the option to produce the generated parser in C++ to 
avoid the need to rewrite this application code as well.

During 2001-2002 we were fortunate enough to have the services of a 
visiting tutor, Jiangu Zuo, from Jianghan University, Wuhan, China who
carried out most of this work. However, this conversion was quite a 
lot more difficult than we had hoped and took us about a year to 
complete. We have tried to make a record of problems encountered and 
to give some solutions. If you would like further information please
contact wiggjd@bcs.org.uk .

The most difficult problem concerned the lack of 'hoisting' in ANTLR 2.7.3
which we were only able to overcome in the time available by copying 
the generated hoisting code from the PCCTS version into our new 
grammar file, hence some of the mysterious C++ statements at the 
beginning of a number of productions.

In August 2002 I reported that this grammar file would be published
'soon' when remaining problems had been cleared up and the grammar was
fit to be published. In the event, for a variety of reasons, this was
not achieved.

So, in view of the number of requests being made for access to this 
grammar I agreed in February 2003 to it being published on the 
www.antlr.org website for general use under the usual terms, in the 
hope that interested users would let me know how it could be improved.
Unfortunately, though it could handle C code and some C++ it was 
unable to handle namespaces and a lot of templates so it left a lot to 
be desired.

In September 2003 I supplied a much improved version which I called 
V2.0. This version was picked up by some users. A few problems were 
raised which have since been solved.

Since then I have been concentrating on tidying up what had become a
rather confusing system and trying to produce a cleaner, tidier and 
easier to understand system and also one easier to use in your 
application. No doubt I have not entirely succeeded yet, but I hope
it is better than it was.

I have introduced the idea of subclassing a user's application code using
an example application program called MyCode.cpp. I hope this clear 
separation of code will lead to a greater clarity of code between the 
application and the parser which will also enable users to install CPP_parser
updates more easily.

In July 2004 I published version 3.0 on the Antlr website (antlr.org).

Since then I have been dealing with users' queries, extending the
range of programs tested and clarifying the grammar file (CPP_parser.g)
to match the language definition enclosed (grammar.txt) with the
package more closely.

In November 2005 I published version 3.1 on the Antlr website (antlr.org).

Since then I have continued dealing with users' queries, extending the
range of programs tested and clarifying the grammar file (CPP_parser.g)
to match the language definition enclosed (grammar.txt) with the
package more closely.


2. Present, November 2007.

I have called this last version, Version 3.2 published November 2007.

I am not proposing to do any more work on this parser other than to try
to solve any problems sent to me by users. If anyone wishes to take this
parser any further I would be pleased to know. I am anxious to hand it on
to someone who can develope it further. An important further development 
would be to convert it to run under ANTLR 3.x .

I am using MSVC 6.0 under Windows.

I created a static source library for the ANTLR code (2.7.3) with
some modifications as discussed below. I assume it will run with
with any later version of Antlr 2.7 (see HowToBuildStaticLibrary.txt).

Please note that this version continues to be used to parse pre-processed
*.i files (with or without embedded #line directives (obtained by using 
the /P command in compilation when using MSVC) ).

However, programs with other extensions (eg. *.cpp) will work correctly 
without being pre-processend provided no types used in the program
are undeclared and there are no specifically pre-processing commands such as 
#ifdef, #include etc. present.

As described in the file "HowToUseCPP_parser.txt" the program MyCode.cpp shows
how a user can extract any required data from the parser.

However, it should be noted that the default is that unless there is a
#line directive at the beginning of the program.i file (containing
therefore the name of the main file being processed) no MyCode functions will
be entered. The reason for this is that we need to process all the included 
files first to obtain typing information but we will probably not want to 
extract any other information from them. Any MyCode functions are entered only
whenever and whilst a #line directive containing the name of the main program
is detected.

I include a small demonstration program, quadratic.i, which you could use to 
test the set up of your system.

Although I cannot say it has been thoroughly tested (since a I have
not yet found a comprehensive test package for C++) it appears to
parse a wide variety of programs of mine and of many other peoples' 
all of which contain a considerable quantity of included files 
containing a great deal of complex code.

It should be noted that this version still handles scoping in a
relatively simplistic manner but this does not appear to be a 
problem (though I agree it might be desirable for some users). To 
do this properly would entail some work to update the antlr 
supporting code in dictionary.cpp etc.

Briefly, as far as scoping is concerned, all template parameter names and 
all type names are held in level 1. All variable names are held in lower 
levels (with higher numbers) but continue to be deleted when they go out 
of scope (See NotesScopes.txt).

Each run should end with the following two statements,

"Support exitExternalScope scope now 0 as required"
"Parse ended"

showing that the scope level had been returned to zero correctly.

I have included a C++ syntax definition (grammar.txt) which appears
to be up to date. If not, please let me know.

Notes about running this version.

* I am currently using antlr 2.7.3 but it should work with any later 
	version of antlr 2.7 .  If not, please let me know.

* A suitable version of ANTLR for MSVC users should be available from
	http://www.antlr2.org .

* The following type of warning produced during compilation of 
   CPPLexer.cpp and CPPParser.cpp can be ignored,

   CPPParser.cpp(163) : warning C4101: 'pe' : unreferenced local variable

* I have introduced a "statementTrace" feature in CPP_parser.g during
   testing which I have found useful. See CPP_parser.g . This can be
   set on (or off) by altering "statementTrace" in CPPParser.cpp and 
   recompiling and linking only (To avoid regenerating lexer and parser
   from CPP_parser.g as well).
   
   With statementTrace set to 1 you get a list of statement types as
   they are detected from external_declaration and/or member_declaration
   in CPP_parser.g.

   With statementTrace set to 2 you also get a record of each symbol
   declared showing its name, scope level, and type (See list in 
   CPPSymbol.hpp).

   The trace output will display but you should be able to place trace 
   output in a trace file like this in a DOS window,

   ...\CPP_parser3.2>debug\CPP_parser  TestIfiles\program.i > program.trace

   I have also found this feature useful for providing the ability to check
   the output from one run to another after making any modifications
   to the parser. Just keep your "standard" or "correct" version of the
   trace output in a separate "archive" file and use this to compare
   with the output of any new updated version.
   
   You can do a file compare like this in a DOS window,

   ...\CPP_parser3.2>fc  /n  programA.trace  program.trace

* I have also implemented a dynamic trace facility to use the excellent
   trace facility in ANTLR by including some extra code in 
   LLkParser.hpp and LLkParser.cpp called antlrTrace() before 
   generating the static antlr library. 
   
  To use this facility you have to use my modified version of 
   LLkParser.hpp and LLkParser.cpp included in this package to replace
   the versions supplied in the ANTLR library (...\lib\cpp\antlr and 
   ...\lib\cpp\src respectively).

  If you don't want to use this facility please comment out all 7
	references to "antlrTrace(false/true);" in CPP_parser.g.

  The advantage of this facility is that, by always generating with
   antlr tracing initially (using -traceParser etc.), antlr tracing 
   can be switched on or off completely by changing the antlrTrace(false) 
   statement in init() in CPPParser.cpp appropriately and recompiling and 
   linking without also having to regenerate from the grammar file each time.

  To use this facility selectively, particularly when debugging a 
   problem statement in a large program, I have included the following
   statements in the grammar file which respectively turn ANTLR trace 
   on and then off,

   "antlrTrace_on" and "antlrTrace_off"

  To use this facility you then have to include "antlrTrace_on" 
   in the source code just before the statement you want to debug and
   "antlrTrace_off" just after the statement you want to debug.

* I have also supplied a slightly modified version of parser.cpp in 
   the ANTLR package which could be used to produce extra error messages
   to cerr (as well as cout) so that these messages would not be missed
   if and when the standard output was piped to a file for debugging 
   purposes.

* I still include MyCode.cpp (with MyCode.hpp) to demonstrate how your
   application code can be subclassed in CPPParser. You can, of course,
   take over, delete, include and amend any of these functions to suit
   your application by following the same pattern.
   
   I recommend using this feature with this grammar as I think it will 
   make it easier both for me to issue updated versions of the CPP_parser 
   grammar from time to time and for you to accept and use them since 
   the code for the parser and your application can then be kept strictly
   apart.

* I have made some modifications to reduce the default tab size from 8 to 4
   spaces. The first is in CharScanner.cpp in the ANTLR source library code
   where I changed the 3 constructors appropriately (eg. tabsize(4)). 

  I believe the source code of tab() supplied in CharScanner.hpp doesn't work
   correctly because for a tab at the beginning of a line tab() appears to be
   entered twice. I have therefore also included an amended version of 
   CharScanner.hpp which you could use to replace the one supplied for the 
   ANTLR source library.

* There is one known problem with the CPP_parser.g grammar and this is connected
   with "asm" statements. The only way I could find of dealing with them was
   to treat them rather like comments but unfortunately this was not as 
   straightforward as it sounds and they are now included in the ID production
   in the lexer. The one problem left is that in asm statements there needs to 
   be at least one space before any '(' or '{' following the "_asm".

* There should not be any ambiguity warnings generated by the ANTLR compiler 
   from CPP_parser.g .

* End of notes about running this version.


3. Future

   I would be grateful if any user of this grammar would advise me and/or
the e-mail group (antlr-interest@antlr.org) of any improvement they 
have been able to make to this grammar, for the benefit of other users.

   As I have mentioned above, I am not proposing to do any more work on
this parser other than for a limited period to try to solve any problems 
sent to me by users. 

   I am now looking for someone to take over this parser for further development
such as a conversion for it to run under the new ANTLR version 3. I should
be able to assist anyone taking over for a limited period.

   Any volunteer should contact me at wiggjd@bcs.org.uk .


Thankyou,


David Wigg
Research Fellow
Centre for Systems and Software Engineering
London South Bank University
London, UK.
wiggjd@bcs.org.uk
sueblack@gmail.com

November 2007
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -