📄 cppinternals.texi

📁 理解和实践操作系统的一本好书
💻 TEXI
📖 第 1 页 / 共 4 页
字号:
@code{enter_macro_context} also handles special macros like@code{__LINE__}.  Although these macros expand to a single token whichcannot contain any further macros, for reasons of token spacing(@pxref{Token Spacing}) and simplicity of implementation, cpplibhandles these special macros by pushing a context containing just thatone token.The final thing that @code{enter_macro_context} does before returningis to mark the macro disabled for expansion (except for special macroslike @code{__TIME__}).  The macro is re-enabled when its context islater popped from the context stack, as described above.  This strictordering ensures that a macro is disabled whilst its expansion isbeing scanned, but that it is @emph{not} disabled whilst any argumentsto it are being expanded.@section Scanning the replacement list for macros to expandThe C standard states that, after any parameters have been replacedwith their possibly-expanded arguments, the replacement list isscanned for nested macros.  Further, any identifiers in thereplacement list that are not expanded during this scan are neveragain eligible for expansion in the future, if the reason they werenot expanded is that the macro in question was disabled.Clearly this latter condition can only apply to tokens resulting fromargument pre-expansion.  Other tokens never have an opportunity to bere-tested for expansion.  It is possible for identifiers that arefunction-like macros to not expand initially but to expand during alater scan.  This occurs when the identifier is the last token of anargument (and therefore originally followed by a comma or a closingparenthesis in its macro's argument list), and when it replaces itsparameter in the macro's replacement list, the subsequent tokenhappens to be an opening parenthesis (itself possibly the first tokenof an argument).It is important to note that when cpplib reads the last token of agiven context, that context still remains on the stack.  Only whenlooking for the @emph{next} token do we pop it off the stack and dropto a lower context.  This makes backing up by one token easy, but moreimportantly ensures that the macro corresponding to the currentcontext is still disabled when we are considering the last token ofits replacement list for expansion (or indeed expanding it).  As anexample, which illustrates many of the points above, consider@smallexample#define foo(x) bar xfoo(foo) (2)@end smallexample@noindent which fully expands to @samp{bar foo (2)}.  During pre-expansionof the argument, @samp{foo} does not expand even though the macro isenabled, since it has no following parenthesis [pre-expansion of anargument only uses tokens from that argument; it cannot take tokensfrom whatever follows the macro invocation].  This still leaves theargument token @samp{foo} eligible for future expansion.  Then, whenre-scanning after argument replacement, the token @samp{foo} isrejected for expansion, and marked ineligible for future expansion,since the macro is now disabled.  It is disabled because thereplacement list @samp{bar foo} of the macro is still on the contextstack.If instead the algorithm looked for an opening parenthesis first andthen tested whether the macro were disabled it would be subtly wrong.In the example above, the replacement list of @samp{foo} would bepopped in the process of finding the parenthesis, re-enabling@samp{foo} and expanding it a second time.@section Looking for a function-like macro's opening parenthesisFunction-like macros only expand when immediately followed by aparenthesis.  To do this cpplib needs to temporarily disable macrosand read the next token.  Unfortunately, because of spacing issues(@pxref{Token Spacing}), there can be fake padding tokens in-between,and if the next real token is not a parenthesis cpplib needs to beable to back up that one token as well as retain the information inany intervening padding tokens.Backing up more than one token when macros are involved is notpermitted by cpplib, because in general it might involve issues likerestoring popped contexts onto the context stack, which are too hard.Instead, searching for the parenthesis is handled by a specialfunction, @code{funlike_invocation_p}, which remembers paddinginformation as it reads tokens.  If the next real token is not anopening parenthesis, it backs up that one token, and then pushes anextra context just containing the padding information if necessary.@section Marking tokens ineligible for future expansionAs discussed above, cpplib needs a way of marking tokens asunexpandable.  Since the tokens cpplib handles are read-only once theyhave been lexed, it instead makes a copy of the token and adds theflag @code{NO_EXPAND} to the copy.For efficiency and to simplify memory management by avoiding having toremember to free these tokens, they are allocated as temporary tokensfrom the lexer's current token run (@pxref{Lexing a line}) using thefunction @code{_cpp_temp_token}.  The tokens are then re-used once thecurrent line of tokens has been read in.This might sound unsafe.  However, tokens runs are not re-used at theend of a line if it happens to be in the middle of a macro argumentlist, and cpplib only wants to back-up more than one lexer token insituations where no macro expansion is involved, so the optimizationis safe.@node Token Spacing@unnumbered Token Spacing@cindex paste avoidance@cindex spacing@cindex token spacingFirst, consider an issue that only concerns the stand-alonepreprocessor: there needs to be a guarantee that re-reading its preprocessedoutput results in an identical token stream.  Without taking specialmeasures, this might not be the case because of macro substitution.For example:@smallexample#define PLUS +#define EMPTY#define f(x) =x=+PLUS -EMPTY- PLUS+ f(=)        @expansion{} + + - - + + = = =@emph{not}        @expansion{} ++ -- ++ ===@end smallexampleOne solution would be to simply insert a space between all adjacenttokens.  However, we would like to keep space insertion to a minimum,both for aesthetic reasons and because it causes problems for people whostill try to abuse the preprocessor for things like Fortran source andMakefiles.For now, just notice that when tokens are added (or removed, as shown bythe @code{EMPTY} example) from the original lexed token stream, we needto check for accidental token pasting.  We call this @dfn{pasteavoidance}.  Token addition and removal can only occur because of macroexpansion, but accidental pasting can occur in many places: both beforeand after each macro replacement, each argument replacement, andadditionally each token created by the @samp{#} and @samp{##} operators.Look at how the preprocessor gets whitespace output correctnormally.  The @code{cpp_token} structure contains a flags byte, and oneof those flags is @code{PREV_WHITE}.  This is flagged by the lexer, andindicates that the token was preceded by whitespace of some form otherthan a new line.  The stand-alone preprocessor can use this flag todecide whether to insert a space between tokens in the output.Now consider the result of the following macro expansion:@smallexample#define add(x, y, z) x + y +z;sum = add (1,2, 3);        @expansion{} sum = 1 + 2 +3;@end smallexampleThe interesting thing here is that the tokens @samp{1} and @samp{2} areoutput with a preceding space, and @samp{3} is output without apreceding space, but when lexed none of these tokens had that property.Careful consideration reveals that @samp{1} gets its precedingwhitespace from the space preceding @samp{add} in the macro invocation,@emph{not} replacement list.  @samp{2} gets its whitespace from thespace preceding the parameter @samp{y} in the macro replacement list,and @samp{3} has no preceding space because parameter @samp{z} has nonein the replacement list.Once lexed, tokens are effectively fixed and cannot be altered, sincepointers to them might be held in many places, in particular byin-progress macro expansions.  So instead of modifying the two tokensabove, the preprocessor inserts a special token, which I call a@dfn{padding token}, into the token stream to indicate that spacing ofthe subsequent token is special.  The preprocessor inserts paddingtokens in front of every macro expansion and expanded macro argument.These point to a @dfn{source token} from which the subsequent real tokenshould inherit its spacing.  In the above example, the source tokens are@samp{add} in the macro invocation, and @samp{y} and @samp{z} in themacro replacement list, respectively.It is quite easy to get multiple padding tokens in a row, for example ifa macro's first replacement token expands straight into another macro.@smallexample#define foo bar#define bar baz[foo]        @expansion{} [baz]@end smallexampleHere, two padding tokens are generated with sources the @samp{foo} tokenbetween the brackets, and the @samp{bar} token from foo's replacementlist, respectively.  Clearly the first padding token is the one touse, so the output code should contain a rule that the firstpadding token in a sequence is the one that matters.But what if a macro expansion is left?  Adjusting the aboveexample slightly:@smallexample#define foo bar#define bar EMPTY baz#define EMPTY[foo] EMPTY;        @expansion{} [ baz] ;@end smallexampleAs shown, now there should be a space before @samp{baz} and thesemicolon in the output.The rules we decided above fail for @samp{baz}: we generate threepadding tokens, one per macro invocation, before the token @samp{baz}.We would then have it take its spacing from the first of these, whichcarries source token @samp{foo} with no leading space.It is vital that cpplib get spacing correct in these examples since anyof these macro expansions could be stringified, where spacing matters.So, this demonstrates that not just entering macro and argumentexpansions, but leaving them requires special handling too.  I madecpplib insert a padding token with a @code{NULL} source token whenleaving macro expansions, as well as after each replaced argument in amacro's replacement list.  It also inserts appropriate padding tokens oneither side of tokens created by the @samp{#} and @samp{##} operators.I expanded the rule so that, if we see a padding token with a@code{NULL} source token, @emph{and} that source token has no leadingspace, then we behave as if we have seen no padding tokens at all.  Aquick check shows this rule will then get the above example correct aswell.Now a relationship with paste avoidance is apparent: we have to becareful about paste avoidance in exactly the same locations we havepadding tokens in order to get white space correct.  This makesimplementation of paste avoidance easy: wherever the stand-alonepreprocessor is fixing up spacing because of padding tokens, and itturns out that no space is needed, it has to take the extra step tocheck that a space is not needed after all to avoid an accidental paste.The function @code{cpp_avoid_paste} advises whether a space is requiredbetween two consecutive tokens.  To avoid excessive spacing, it trieshard to only require a space if one is likely to be necessary, but forreasons of efficiency it is slightly conservative and might recommend aspace where one is not strictly needed.@node Line Numbering@unnumbered Line numbering@cindex line numbers@section Just which line number anyway?There are three reasonable requirements a cpplib client might have forthe line number of a token passed to it:@itemize @bullet@itemThe source line it was lexed on.@itemThe line it is output on.  This can be different to the line it waslexed on if, for example, there are intervening escaped newlines orC-style comments.  For example:@smallexamplefoo /* @r{A longcomment} */ bar \baz@result{}foo bar baz@end smallexample@itemIf the token results from a macro expansion, the line of the macro name,or possibly the line of the closing parenthesis in the case of
💿 文件大小 1658 K
👤 上传用户 Fiona1207
📂 所属分类其他书籍
🏷️ 相关标签

#实践 #操作系统
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -