📄 19.doc.html
字号:
<html>
<head>
<title>The Java Language Specification LALR(1) Grammar </title>
</head>
<body BGCOLOR=#eeeeff text=#000000 LINK=#0000ff VLINK=#000077 ALINK=#ff0000>
<a href="index.html">Contents</a> | <a href="18.doc.html">Prev</a> | <a href="javalang.doc.html">Next</a> | <a href="j.index.doc1.html">Index</a>
<hr><br>
<a name="52994"></a>
<p><strong>
CHAPTER 19 </strong></p>
<a name="52996"></a>
<h1>LALR(1) Grammar </h1>
<hr><p>
<a name="26238"></a>
This chapter presents a grammar for Java. The grammar has been mechanically
checked to insure that it is LALR(1).
<p><a name="26239"></a>
The grammar for Java presented piecemeal in the preceding chapters is much better for exposition, but it cannot be parsed left-to-right with one token of lookahead because of certain syntactic peculiarities, some of them inherited from C and C++. These problems and the solutions adopted for the LALR(1) grammar are presented below, followed by the grammar itself.<p>
<a name="44467"></a>
<h2>19.1 Grammatical Difficulties</h2>
<a name="44468"></a>
There are five problems with the grammar presented in preceding chapters.
<p><a name="44469"></a>
<h3>19.1.1 Problem #1: Names Too Specific</h3>
<a name="44470"></a>
Consider the two groups of productions:
<p><ul><pre>
<sub><i>PackageName:<br>
</i></sub> <i>Identifier<br>
</i> <i>PackageName</i><code> . </code><i>Identifier
</i>
<i>TypeName:<br>
</i> <i>Identifier<br>
</i> <i>PackageName</i><code> . </code><i>Identifier
</i></pre></ul><a name="44473"></a>
and:
<p><ul><pre>
<i>MethodName:<br>
</i> <i>Identifier<br>
</i> <i>AmbiguousName</i><code> . </code><i>Identifier
</i>
<i>AmbiguousName:<br>
</i> <i>Identifier<br>
</i> <i>AmbiguousName</i><code> . </code><i>Identifier
</i></pre></ul><a name="44476"></a>
Now consider the partial input:
<p><pre><a name="44477"></a>class Problem1 { int m() { hayden.
</pre><a name="44478"></a>
When the parser is considering the token <code>hayden</code>, with one-token lookahead to
symbol "<code>.</code>", it cannot yet tell whether <code>hayden</code> should be a <i>PackageName</i> that
qualifies a type name, as in:
<p><pre><a name="44479"></a>hayden.Dinosaur rex = new hayden.Dinosaur(2);
</pre><a name="44480"></a>
or an <i>AmbiguousName</i> that qualifies a method name, as in:
<p><pre><a name="44481"></a>hayden.print("Dinosaur Rex!");
</pre><a name="44482"></a>
Therefore, the productions shown above result in a grammar that is not LALR(1).
There are also other problems with drawing distinctions among different kinds of
names in the grammar.
<p><a name="44483"></a>
The solution is to eliminate the nonterminals <i>PackageName</i>, <i>TypeName</i>, <i>ExpressionName</i>, <i>MethodName</i>, and <i>AmbiguousName</i>, replacing them all with a single nonterminal <i>Name</i>:<p>
<ul><pre>
<i>Name:<br>
SimpleName<br>
QualifiedName
</i>
<i>SimpleName:<br>
Identifier
</i>
<i>QualifiedName:<br>
Name</i><code> . </code><i>Identifier
</i></pre></ul><a name="44487"></a>
A later stage of compiler analysis then sorts out the precise role of each name or
name qualifier.
<p><a name="44578"></a>
For related reasons, these productions in <a href="4.doc.html#9317">§4.3</a>:<p>
<ul><pre>
<i>ClassOrInterfaceType:<br>
</i> <i>ClassType<br>
</i> <i>InterfaceType
</i>
<i>ClassType:<br>
</i> <i>TypeName
</i>
<i>InterfaceType:<br>
</i> <i>TypeName
</i></pre></ul><a name="44679"></a>
were changed to:
<p><ul><pre>
<i>ClassOrInterfaceType:<br>
</i> <i>Name
</i>
<i>ClassType:<br>
</i><code> </code><i>ClassOrInterfaceType
</i>
<i>InterfaceType:<br>
</i><code> </code><i>ClassOrInterfaceType
</i></pre></ul><a name="44488"></a>
<h3>19.1.2 Problem #2: Modifiers Too Specific</h3>
<a name="44489"></a>
Consider the two groups of productions:
<p><ul><pre>
<i>FieldDeclaration:<br>
FieldModifiers</i><sub><i>opt</i></sub><code> </code><i>Type</i><code> </code><i>VariableDeclarators</i><code> ;
</code>
<i>FieldModifiers:<br>
</i> <i>FieldModifier<br>
</i> <i>FieldModifiers</i><code> </code><i>FieldModifier
</i>
<i>FieldModifier:</i> <i>one</i> <i>of<br>
</i> <code>public</code> <code>protected</code> <code>private<br>
final static transient volatile
</code></pre></ul><a name="44493"></a>
and:
<p><ul><pre>
<i>MethodHeader:<br>
</i> <i>MethodModifiers</i><sub><i>opt</i></sub><code> </code><i>ResultType</i><code> </code><i>MethodDeclarator</i><code> </code><i>Throws</i><sub><i>opt
</i></sub>
<i>MethodModifiers:<br>
</i> <i>MethodModifier<br>
</i> <i>MethodModifiers</i><code> </code><i>MethodModifier
</i>
<i>MethodModifier:</i> <i>one</i> <i>of<br>
</i> <code>public</code> <code>protected</code> <code>private<br>
static<br>
abstract final native synchronized
</code></pre></ul><a name="44497"></a>
Now consider the partial input:
<p><pre><a name="44498"></a>class Problem2 { public static int
</pre><a name="44499"></a>
When the parser is considering the token <code>static</code>, with one-token lookahead to
symbol <code>int</code>-or, worse yet, considering the token <code>public</code> with lookahead to
<code>static</code>-it cannot yet tell whether this will be a field declaration such as:
<p><pre><a name="44500"></a>public static int maddie = 0;
</pre><a name="44501"></a>
or a method declaration such as:
<p><pre><a name="44502"></a>public static int maddie(String art) { return art.length(); }
</pre><a name="44503"></a>
Therefore, the parser cannot tell with only one-token lookahead whether <code>static</code>
(or, similarly, <code>public</code>) should be reduced to <i>FieldModifier</i> or <i>MethodModifier</i>.
Therefore, the productions shown above result in a grammar that is not LALR(1).
There are also other problems with drawing distinctions among different kinds of
modifiers in the grammar.
<p><a name="44504"></a>
While not all contexts provoke the problem, the simplest solution is to combine all contexts in which such modifiers are used, eliminating all six of the nonterminals  <i>ClassModifiers </i><a href="8.doc.html#21613">(§8.1.2)</a>, <i>FieldModifiers </i><a href="8.doc.html#78091">(§8.3.1)</a>, <i>MethodModifiers </i><a href="8.doc.html#78188">(§8.4.3)</a>, <i>ConstructorModifiers </i><a href="8.doc.html#42018">(§8.6.3)</a>, <i>InterfaceModifiers </i><a href="9.doc.html#235947">(§9.1.2)</a>, and <i>ConstantModifiers</i>  <a href="9.doc.html#78642">(§9.3)</a> from the grammar, replacing them all with a single nonterminal <i>Modifiers</i>:<p>
<ul><pre>
<i>Modifiers:<br>
Modifier<br>
Modifiers</i><code> </code><i>Modifier
</i>
<i>Modifier: one of<br>
</i><code>public protected private<br>
static<br>
abstract final native synchronized transient volatile
</code></pre></ul><a name="44525"></a>
A later stage of compiler analysis then sorts out the precise role of each modifier
and whether it is permitted in a given context.
<p><a name="44526"></a>
<h3>19.1.3 Problem #3: Field Declaration versus Method Declaration</h3>
<a name="44527"></a>
Consider the two productions (shown after problem #2 has been corrected):
<p><ul><pre>
<i>FieldDeclaration:<br>
Modifiers</i><sub><i>opt</i></sub><code> </code><i>Type</i><code> </code><i>VariableDeclarators</i><code> ;
</code></pre></ul><a name="44529"></a>
and:
<p><ul><pre>
<i>MethodHeader:<br>
</i> <i>Modifiers</i><sub><i>opt</i></sub><code> </code><i>ResultType</i><code> </code><i>MethodDeclarator</i><code> </code><i>Throws</i><sub><i>opt
</i></sub></pre></ul><a name="44531"></a>
where <i>ResultType</i> is defined as:
<p><ul><pre>
<i>ResultType:<br>
</i> <i>Type<br>
</i> <code>void
</code></pre></ul><a name="44533"></a>
Now consider the partial input:
<p><pre><a name="44534"></a>class Problem3 { int julie
</pre><a name="44535"></a>
Note that, in this simple example, no <i>Modifiers</i> are present. When the parser is
considering the token <code>int</code>, with one-token lookahead to symbol <code>julie</code>, it cannot
yet tell whether this will be a field declaration such as:
<p><pre><a name="44536"></a>int julie = 14;
</pre><a name="44537"></a>
or a method declaration such as:
<p><pre><a name="44538"></a>int julie(String art) { return art.length(); }
</pre><a name="44539"></a>
Therefore, after the parser reduces <code>int</code> to the nonterminal <i>Type</i>, it cannot tell with
only one-token lookahead whether <i>Type</i> should be further reduced to <i>ResultType</i>
(for a method declaration) or left alone (for a field declaration). Therefore, the
productions shown above result in a grammar that is not LALR(1).
<p><a name="44540"></a>
The solution is to eliminate the <i>ResultType</i> production and to have separate alternatives for <i>MethodHeader</i>:<p>
<ul><pre>
<i>MethodHeader:<br>
</i> <i>Modifiers</i><sub><i>opt</i></sub><code> </code><i>Type</i><code> </code><i>MethodDeclarator</i><code> </code><i>Throws</i><sub><i>opt<br>
</i></sub> <i>Modifiers</i><sub><i>opt</i></sub><code> void </code><i>MethodDeclarator</i><code> </code><i>Throws</i><sub><i>opt
</i></sub></pre></ul><a name="44542"></a>
This allows the parser to reduce <code>int</code> to <i>Type</i> and then leave it as is, delaying the decision as to whether a field declaration or method declaration is in progress.<p>
<a name="44543"></a>
<h3>19.1.4 Problem #4: Array Type versus Array Access</h3>
<a name="44544"></a>
Consider the productions (shown after problem #1 has been corrected):
<p><ul><pre>
<i>ArrayType:<br>
</i><code> </code><i>Type</i><code> [ ]
</code></pre></ul><a name="44546"></a>
and:
<p><ul><pre>
<i>ArrayAccess:<br>
</i> <i>Name</i><code> [ </code><i>Expression</i><code> ]<br>
</code> <i>PrimaryNoNewArray</i><code> [ </code><i>Expression</i><code> ]
</code></pre></ul><a name="44548"></a>
Now consider the partial input:
<p><pre><a name="44549"></a>class Problem4 { Problem4() { peter[
</pre><a name="44550"></a>
When the parser is considering the token <code>peter</code>, with one-token lookahead to
symbol <code>[</code>, it cannot yet tell whether <code>peter</code> will be part of a type name, as in:
<p><pre><a name="44551"></a>peter[] team;
</pre><a name="44552"></a>
or part of an array access, as in:
<p><pre><a name="44553"></a>peter[3] = 12;
</pre><a name="44554"></a>
Therefore, after the parser reduces <code>peter</code> to the nonterminal <i>Name</i>, it cannot tell
with only one-token lookahead whether <i>Name</i> should be reduced ultimately to
<i>Type</i> (for an array type) or left alone (for an array access). Therefore, the productions
shown above result in a grammar that is not LALR(1).
<p><a name="44555"></a>
The solution is to have separate alternatives for <em>ArrayType</em>:<p>
<ul><pre>
<i>ArrayType:<br>
</i><code> </code><i>PrimitiveType</i><code> [ ]<br>
</code><i>Name</i><code> [ ]<br>
</code><i>ArrayType</i><code> [ ]
</code></pre></ul><a name="44557"></a>
This allows the parser to reduce <code>peter</code> to <i>Name</i> and then leave it as is, delaying
the decision as to whether an array type or array access is in progress.
<p><a name="44559"></a>
<h3>19.1.5 Problem #5: Cast versus Parenthesized Expression</h3>
<a name="44560"></a>
Consider the production:
<p><ul><pre>
<i>CastExpression:<br>
</i><code> ( </code><i>PrimitiveType</i><code> ) </code><i>UnaryExpression<br>
</i><code> ( </code><i>ReferenceType</i><code> ) </code><i>UnaryExpressionNotPlusMinus
</i></pre></ul><a name="44562"></a>
Now consider the partial input:
<p><pre><a name="44563"></a>class Problem5 { Problem5() { super((matthew)
</pre><a name="44564"></a>
When the parser is considering the token <code>matthew</code>, with one-token lookahead to
symbol <code>)</code>, it cannot yet tell whether <code>(matthew)</code> will be a parenthesized expression,
as in:
<p><pre><a name="44565"></a>super((matthew), 9);
</pre><a name="44566"></a>
or a cast, as in:
<p><pre><a name="44567"></a>super((matthew)baz, 9);
</pre><a name="44568"></a>
Therefore, after the parser reduces <code>matthew</code> to the nonterminal <i>Name</i>, it cannot
tell with only one-token lookahead whether <i>Name</i> should be further reduced to
<i>PostfixExpression</i> and ultimately to <i>Expression</i> (for a parenthesized expression) or
to <i>ClassOrInterfaceType</i> and then to <i>ReferenceType</i> (for a cast). Therefore, the
productions shown above result in a grammar that is not LALR(1).
<p><a name="44569"></a>
The solution is to eliminate the use of the nonterminal <i>ReferenceType</i> in the definition of <i>CastExpression</i>, which requires some reworking of both alternatives to avoid other ambiguities:<p>
<ul><pre>
<i>CastExpression:<br>
</i><code> ( </code><i>PrimitiveType</i><code> </code><i>Dims</i><sub><i>opt</i></sub><code> ) </code><i>UnaryExpression<br>
</i><code> ( </code><i>Expression</i><code> ) </code><i>UnaryExpressionNotPlusMinus<br>
</i><code>( </code><i>Name</i><code> </code><i>Dims</i><code> ) </code><i>UnaryExpressionNotPlusMinus
</i></pre></ul><a name="44571"></a>
This allows the parser to reduce <code>matthew</code> to <i>Expression</i> and then leave it there,
delaying the decision as to whether a parenthesized expression or a cast is in
progress. Inappropriate variants such as:
<p><pre><a name="44572"></a>(int[])+3
</pre><a name="44573"></a>
and:
<p><pre><a name="44574"></a>(matthew+1)baz
</pre><a name="44575"></a>
must then be weeded out and rejected by a later stage of compiler analysis.
<p><a name="28201"></a>
The remaining sections of this chapter constitute a LALR(1) grammar for Java syntax, in which the five problems described above have been solved.<p>
<a name="50220"></a>
<h2>19.2 Productions from <a href="2.doc.html#140845">§2.3: The Syntactic Grammar</a></h2>
<ul><pre>
<i>Goal:<br>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -