📄 2.doc.html
字号:
<html>
<head>
<title>The Java Language Specification Grammars</title>
</head>
<body BGCOLOR=#eeeeff text=#000000 LINK=#0000ff VLINK=#000077 ALINK=#ff0000>
<a href="index.html">Contents</a> | <a href="1.doc.html">Prev</a> | <a href="3.doc.html">Next</a> | <a href="j.index.doc1.html">Index</a>
<hr><br>
<a name="14911"></a>
<p><strong>
CHAPTER 2 </strong></p>
<a name="44271"></a>
<h1>Grammars</h1>
<hr><p>
<a name="90172"></a>
This chapter describes the context-free grammars used in this specification to
define the lexical and syntactic structure of a Java program.
<p><a name="40415"></a>
<h2>2.1 Context-Free Grammars</h2>
<a name="40485"></a>
A <i>context-free </i><i>grammar</i> consists of a number of <i>productions</i>. Each production has
an abstract symbol called a <i>nonterminal</i> as its <i>left-hand side</i>, and a sequence of
one or more nonterminal and <i>terminal</i> symbols as its <i>right-hand side</i>. For each
grammar, the terminal symbols are drawn from a specified <i>alphabet</i>.
<p><a name="40491"></a>
Starting from a sentence consisting of a single distinguished nonterminal, called the <i>goal symbol</i>, a given context-free grammar specifies a <i>language</i>, namely, the infinite set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.<p>
<a name="142375"></a>
<h2>2.2 The Lexical Grammar</h2>
<a name="142382"></a>
A <i>lexical grammar</i> for Java is given in <a href="3.doc.html#48198">§3</a>. This grammar has as its terminal symbols
the characters of the Unicode character set. It defines a set of productions,
starting from the goal symbol <i>Input </i><a href="3.doc.html#25687">(§3.5)</a>,<i> </i>that describe how sequences of Unicode
characters <a href="3.doc.html#95413">(§3.1)</a> are translated into a sequence of input elements <a href="3.doc.html#25687">(§3.5)</a>.
<p><a name="140274"></a>
These input elements, with white space <a href="3.doc.html#95710">(§3.6)</a> and comments <a href="3.doc.html#48125">(§3.7)</a> discarded, form the terminal symbols for the syntactic grammar for Java and are called Java <i>tokens</i> <a href="3.doc.html#25687">(§3.5)</a>. These tokens are the identifiers <a href="3.doc.html#40625">(§3.8)</a>, keywords <a href="3.doc.html#229308">(§3.9)</a>, literals <a href="3.doc.html#48272">(§3.10)</a>, separators <a href="3.doc.html#230752">(§3.11)</a>, and operators <a href="3.doc.html#230752">(§3.11)</a> of the Java language.<p>
<a name="140845"></a>
<h2>2.3 The Syntactic Grammar</h2>
<a name="142461"></a>
The <i>syntactic grammar</i> for Java is given in Chapters <a href="4.doc.html#95843">4</a>, <a href="6.doc.html#48086">6</a>-<a href="10.doc.html#27803">10</a>, <a href="14.doc.html#44383">14</a>, and <a href="15.doc.html#4709">15</a>. This
grammar has Java tokens defined by the lexical grammar as its terminal symbols.
It defines a set of productions, starting from the goal symbol <i>CompilationUnit</i>
<a href="7.doc.html#40031">(§7.3)</a>, that describe how sequences of tokens can form syntactically correct Java
programs.
<p><a name="15629"></a>
A LALR(1) version of the syntactic grammar is presented in Chapter <a href="19.doc.html#52994">19</a>. The grammar in the body of this specification is very similar to the LALR(1) grammar but more readable.<p>
<a name="90767"></a>
<h2>2.4 Grammar Notation</h2>
<a name="48048"></a>
Terminal symbols are shown in <code>fixed</code> <code>width</code> font in the productions of the lexical
and syntactic grammars, and throughout this specification whenever the text is
directly referring to such a terminal symbol. These are to appear in a program
exactly as written.
<p><a name="139619"></a>
Nonterminal symbols are shown in <i>italic</i> type. The definition of a nonterminal is introduced by the name of the nonterminal being defined followed by a colon. One or more alternative right-hand sides for the nonterminal then follow on succeeding lines. For example, the syntactic definition:<p>
<ul><pre>
<i>IfThenStatement:<br>
</i> <code>if ( </code><i>Expression</i><code> ) </code><i>Statement
</i></pre></ul><a name="48054"></a>
states that the nonterminal <i>IfThenStatement </i>represents the token <code>if</code>, followed by a
left parenthesis token, followed by an <i>Expression</i>, followed by a right parenthesis
token, followed by a <i>Statement</i>. As another example, the syntactic definition:
<p><ul><pre>
<i>ArgumentList:<br>
</i> <i>Argument<br>
</i> <i>ArgumentList</i><code> , </code><i>Argument
</i></pre></ul><a name="139922"></a>
states that an <i>ArgumentList</i> may represent either a single <i>Argument</i> or an
<i>ArgumentList</i>,  followed by a comma, followed by an <i>Argument</i>. This definition of
<i>ArgumentList</i> is <i>recursive</i>, that is to say, it is defined in terms of itself. The result
is that an <i>ArgumentList</i> may contain any positive number of arguments. Such
recursive definitions of nonterminals are common.
<p><a name="149388"></a>
The subscripted suffix "<i>opt</i>", which may appear after a terminal or nonterminal, indicates an <i>optional symbol</i>. The alternative containing the optional symbol actually specifies two right-hand sides, one that omits the optional element and one that includes it. This means that:<p>
<ul><pre>
<i>BreakStatement:<br>
</i> <code>break </code><i>Identifier</i><sub><i>opt</i></sub><code> ;
</code></pre></ul><a name="48059"></a>
is a convenient abbreviation for:
<p><ul><pre>
<i>BreakStatement:<br>
</i> <code>break ;<br>
</code> <code>break </code><i>Identifier</i><code> ;
</code></pre></ul><a name="48061"></a>
and that:
<p><ul><pre>
<i>ForStatement</i>:<br>
<code>for ( </code><i>ForInit</i><sub><i>opt</i></sub><code> ; </code><i>Expression</i><sub><i>opt</i></sub><code> ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement
</i></pre></ul><a name="44856"></a>
is a convenient abbreviation for:
<p><ul><pre>
<i>ForStatement</i>:<br>
<code> for ( ; </code><i>Expression</i><sub><i>opt</i></sub><code> ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; </code><i>Expression</i><sub><i>opt</i></sub><code> ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement
</i></pre></ul><a name="49500"></a>
which in turn is an abbreviation for:
<p><ul><pre>
<i>ForStatement</i>:<br>
<code> for ( ; ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement<br>
</i><code> for ( ; </code><i>Expression</i><code> ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; </code><i>Expression</i><code> ; </code><i>ForUpdate</i><sub><i>opt</i></sub><code> ) </code><i>Statement
</i></pre></ul><a name="49529"></a>
which in turn is an abbreviation for:
<p><ul><pre>
<i>ForStatement</i>:<br>
<code> for ( ; ; ) </code><i>Statement<br>
</i><code> for ( ; ; </code><i>ForUpdate</i><code> ) </code><i>Statement<br>
</i><code> for ( ; </code><i>Expression</i><code> ; ) </code><i>Statement<br>
</i><code> for ( ; </code><i>Expression</i><code> ; </code><i>ForUpdate</i><code> ) </code><i>Statement<br>
</i> <code>for ( </code><i>ForInit</i><code> ; ; ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; ; </code><i>ForUpdate</i><code> ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; </code><i>Expression</i><code> ; ) </code><i>Statement<br>
</i><code> for ( </code><i>ForInit</i><code> ; </code><i>Expression</i><code> ; </code><i>ForUpdate</i><code> ) </code><i>Statement
</i></pre></ul><a name="48067"></a>
so the nonterminal <i>ForStatement</i> actually has eight alternative right-hand sides.
<p><a name="27547"></a>
A very long right-hand side may be continued on a second line by substantially indenting this second line, as in:<p>
<ul><pre>
<i>ConstructorDeclaration</i>:<br>
<i> ConstructorModifiers</i><sub><i>opt</i></sub><code> </code><i>ConstructorDeclarator<br>
</i> <i>Throws</i><sub><i>opt</i></sub><code> </code><i>ConstructorBody
</i></pre></ul><a name="27550"></a>
which defines one right-hand side for the nonterminal <i>ConstructorDeclaration</i>.
(This right-hand side is an abbreviation for four alternative right-hand sides,
because of the two occurrences of "<sub><i>opt</i></sub>".)
<p><a name="139949"></a>
When the words "one of" follow the colon in a grammar definition, they signify that each of the terminal symbols on the following line or lines is an alternative definition. For example, the lexical grammar for Java contains the production:<p>
<ul><pre>
<i>ZeroToThree: one of<br>
</i> <code>0 1 2 3
</code></pre></ul><a name="46646"></a>
which is merely a convenient abbreviation for:
<p><ul><pre>
<i>ZeroToThree:<br>
</i> <code>0<br>
</code> <code>1<br>
</code> <code>2<br>
</code> <code>3
</code></pre></ul><a name="46654"></a>
When an alternative in a lexical production appears to be a token, it represents the sequence of characters that would make up such a token. Thus, the definition:<p>
<ul><pre>
<i>BooleanLiteral: one of<br>
</i><code> true false
</code></pre></ul><a name="7071"></a>
in a lexical grammar production is shorthand for:
<p><ul><pre>
<i>BooleanLiteral:<br>
</i><code> t r u e<br>
f a l s e
</code></pre></ul><a name="19987"></a>
The right-hand side of a lexical production may specify that certain expansions are not permitted by using the phrase "but not" and then indicating the expansions to be excluded, as in the productions for <i>InputCharacter</i> <a href="3.doc.html#231571">(§3.4)</a> and <i>Identifier</i> <a href="3.doc.html#40625">(§3.8)</a>:<p>
<ul><pre>
<i>InputCharacter:<br>
</i> <i>UnicodeInputCharacter</i> but not CR or LF
<i>Identifier:<br>
</i> <i>IdentifierName</i> but not a <i>Keyword</i> or <i>BooleanLiteral</i> or <i>NullLiteral
</i></pre></ul><a name="23502"></a>
Finally, a few nonterminal symbols are described by a descriptive phrase in roman type in cases where it would be impractical to list all the alternatives:<p>
<ul><pre>
<i>RawInputCharacter</i>:<br>
any Unicode character
</pre></ul>
<hr>
<!-- This inserts footnotes--><p>
<a href="index.html">Contents</a> | <a href="1.doc.html">Prev</a> | <a href="3.doc.html">Next</a> | <a href="j.index.doc1.html">Index</a>
<p>
<font size=-1>Java Language Specification (HTML generated by Suzette Pelouch on February 24, 1998)<br>
<i><a href="jcopyright.doc.html">Copyright © 1996 Sun Microsystems, Inc.</a>
All rights reserved</i>
<br>
Please send any comments or corrections to <a href="mailto:doug.kramer@sun.com">doug.kramer@sun.com</a>
</font>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -