📄 lexertips.html
字号:
<html><head><title>Tips for writing a good JavaCC lexical specification</title></head><body><h1 align="center"> Tips for writing a good JavaCC lexical specification</h1><hr><p>There are many ways to write the lexical specification for a grammar. But the performance of the generated token manager varies significantly dependingon how you do this. Here are a few tips:</p><ul><li> Try to specify as many String literals as possible. These are recognized by a Deterministic Finite Automata (DFA), which is much faster than the Nondeterministic Finite Automata (NFA) needed to recognize other kinds of complex regular expressions. For example, to skip blanks/tabs/newlines,<pre> SKIP : { " " | "\t" | "\n" }</pre> is more efficient than doing <pre> SKIP : { < ([" ", "\t", "\n"])+ > }</pre> because in the first case you only have string literals, it will generate a DFA whereas for the second case it will generate an NFA.<p></p></li><li> Try to use the pattern ~[] just by itself as much as possible. For example, doing a <pre> MORE : { < ~[] > }</pre> is better than doing<pre> TOKEN : { < (~[])+ > }</pre> of course, if your grammar dictates that one of these cannot be used, then you don't have a choice, but try to use < ~[] > as much as possible.<p></p></li><li> Specify all the String literals in the order of increasing length, i.e., all shorter string literals before longer ones. This will help optimizing the bit vectors needed for string literals.<p></p></li><li> Try to minimize the use of lexical states. When using these, try to move all your complex regular expressions into a single lexical state, leaving others to just recognize simple string literals.<p></p></li><li> Try to use IGNORE_CASE judiciously. Best thing to do is to set this option at the grammar level. If that is not possible, then try to have it set for *all* regular expressions in a lexical state. There is heavy performance penalty for setting IGNORE_CASE for some regular expressions and not for others in the same lexical state.<p></p></li><li> Try to SKIP as much possible, if you don't care about certain patterns. Here, you have to be a bit careful about EOF. seeing an EOF after SKIP is fine whereas, seeing an EOF after a MORE is a lexical error.<p></p></li><li> Try to avoid specifying lexical actions with MORE specifications. Generally every MORE should end up in a TOKEN (or SPECIAL_TOKEN) finally so you can do the action there at the TOKEN level, if it is possible.<p></p></li><li> Also try to avoid lexical actions and lexical state changes with SKIP specifications (especially for single character SKIP's like " ", "\t", "\n" etc.). For such cases, a simple loop is generated to eat up the SKIP'ed single characters. So obviously, if there is a lexical action or state change associated with this, it is not possible to it this way.<p></p></li><li>Try to avoid having a choice of String literals for the same token, e.g.<pre> < NONE : "\"none\"" | "\'none\'" ></pre> Instead, have two different token kinds for this and use a nonterminal which is a choice between those choices. The above example can be written as : <pre> < NONE1 : "\"none\"" > | < NONE2 : "\'none'\" > </pre> and define a nonterminal called None() as : <pre> void None() : {} { <NONE1> | <NONE2> }</pre> This will make recognition much faster. Note however, that if the choice is between two complex regular expressions, it is OK to have the choice.<p></p></li></ul></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -