📄 cppinternals.texi
字号:
function-like macro expansion.@end itemizeThe @code{cpp_token} structure contains @code{line} and @code{col}members. The lexer fills these in with the line and column of the firstcharacter of the token. Consequently, but maybe unexpectedly, a tokenfrom the replacement list of a macro expansion carries the location ofthe token within the @code{#define} directive, because cpplib expands amacro by returning pointers to the tokens in its replacement list. Thecurrent implementation of cpplib assigns tokens created from built-inmacros and the @samp{#} and @samp{##} operators the location of the mostrecently lexed token. This is a because they are allocated from thelexer's token runs, and because of the way the diagnostic routines inferthe appropriate location to report.The diagnostic routines in cpplib display the location of the mostrecently @emph{lexed} token, unless they are passed a specific line andcolumn to report. For diagnostics regarding tokens that arise frommacro expansions, it might also be helpful for the user to see theoriginal location in the macro definition that the token came from.Since that is exactly the information each token carries, such anenhancement could be made relatively easily in future.The stand-alone preprocessor faces a similar problem when determiningthe correct line to output the token on: the position attached to atoken is fairly useless if the token came from a macro expansion. Alltokens on a logical line should be output on its first physical line, sothe token's reported location is also wrong if it is part of a physicalline other than the first.To solve these issues, cpplib provides a callback that is generatedwhenever it lexes a preprocessing token that starts a new logical lineother than a directive. It passes this token (which may be a@code{CPP_EOF} token indicating the end of the translation unit) to thecallback routine, which can then use the line and column of this tokento produce correct output.@section Representation of line numbersAs mentioned above, cpplib stores with each token the line number thatit was lexed on. In fact, this number is not the number of the line inthe source file, but instead bears more resemblance to the number of theline in the translation unit.The preprocessor maintains a monotonic increasing line count, which isincremented at every new line character (and also at the end of anybuffer that does not end in a new line). Since a line number of zero isuseful to indicate certain special states and conditions, this variablestarts counting from one.This variable therefore uniquely enumerates each line in the translationunit. With some simple infrastructure, it is straight forward to mapfrom this to the original source file and line number pair, saving spacewhenever line number information needs to be saved. The code theimplements this mapping lies in the files @file{line-map.c} and@file{line-map.h}.Command-line macros and assertions are implemented by pushing a buffercontaining the right hand side of an equivalent @code{#define} or@code{#assert} directive. Some built-in macros are handled similarly.Since these are all processed before the first line of the main inputfile, it will typically have an assigned line closer to twenty than toone.@node Guard Macros@unnumbered The Multiple-Include Optimization@cindex guard macros@cindex controlling macros@cindex multiple-include optimizationHeader files are often of the form@smallexample#ifndef FOO#define FOO@dots{}#endif@end smallexample@noindentto prevent the compiler from processing them more than once. Thepreprocessor notices such header files, so that if the header fileappears in a subsequent @code{#include} directive and @code{FOO} isdefined, then it is ignored and it doesn't preprocess or even re-openthe file a second time. This is referred to as the @dfn{multipleinclude optimization}.Under what circumstances is such an optimization valid? If the filewere included a second time, it can only be optimized away if thatinclusion would result in no tokens to return, and no relevantdirectives to process. Therefore the current implementation imposesrequirements and makes some allowances as follows:@enumerate@itemThere must be no tokens outside the controlling @code{#if}-@code{#endif}pair, but whitespace and comments are permitted.@itemThere must be no directives outside the controlling directive pair, butthe @dfn{null directive} (a line containing nothing other than a single@samp{#} and possibly whitespace) is permitted.@itemThe opening directive must be of the form@smallexample#ifndef FOO@end smallexampleor@smallexample#if !defined FOO [equivalently, #if !defined(FOO)]@end smallexample@itemIn the second form above, the tokens forming the @code{#if} expressionmust have come directly from the source file---no macro expansion musthave been involved. This is because macro definitions can change, andtracking whether or not a relevant change has been made is not worth theimplementation cost.@itemThere can be no @code{#else} or @code{#elif} directives at the outerconditional block level, because they would probably contain somethingof interest to a subsequent pass.@end enumerateFirst, when pushing a new file on the buffer stack,@code{_stack_include_file} sets the controlling macro @code{mi_cmacro} to@code{NULL}, and sets @code{mi_valid} to @code{true}. This indicatesthat the preprocessor has not yet encountered anything that wouldinvalidate the multiple-include optimization. As described in the nextfew paragraphs, these two variables having these values effectivelyindicates top-of-file.When about to return a token that is not part of a directive,@code{_cpp_lex_token} sets @code{mi_valid} to @code{false}. Thisenforces the constraint that tokens outside the controlling conditionalblock invalidate the optimization.The @code{do_if}, when appropriate, and @code{do_ifndef} directivehandlers pass the controlling macro to the function@code{push_conditional}. cpplib maintains a stack of nested conditionalblocks, and after processing every opening conditional this functionpushes an @code{if_stack} structure onto the stack. In this structureit records the controlling macro for the block, provided there is oneand we're at top-of-file (as described above). If an @code{#elif} or@code{#else} directive is encountered, the controlling macro for thatblock is cleared to @code{NULL}. Otherwise, it survives until the@code{#endif} closing the block, upon which @code{do_endif} sets@code{mi_valid} to true and stores the controlling macro in@code{mi_cmacro}.@code{_cpp_handle_directive} clears @code{mi_valid} when processing anydirective other than an opening conditional and the null directive.With this, and requiring top-of-file to record a controlling macro, andno @code{#else} or @code{#elif} for it to survive and be copied to@code{mi_cmacro} by @code{do_endif}, we have enforced the absence ofdirectives outside the main conditional block for the optimization to beon.Note that whilst we are inside the conditional block, @code{mi_valid} islikely to be reset to @code{false}, but this does not matter sincethe closing @code{#endif} restores it to @code{true} if appropriate.Finally, since @code{_cpp_lex_direct} pops the file off the buffer stackat @code{EOF} without returning a token, if the @code{#endif} directivewas not followed by any tokens, @code{mi_valid} is @code{true} and@code{_cpp_pop_file_buffer} remembers the controlling macro associatedwith the file. Subsequent calls to @code{stack_include_file} result inno buffer being pushed if the controlling macro is defined, effectingthe optimization.A quick word on how we handle the@smallexample#if !defined FOO@end smallexample@noindentcase. @code{_cpp_parse_expr} and @code{parse_defined} take steps to seewhether the three stages @samp{!}, @samp{defined-expression} and@samp{end-of-directive} occur in order in a @code{#if} expression. Ifso, they return the guard macro to @code{do_if} in the variable@code{mi_ind_cmacro}, and otherwise set it to @code{NULL}.@code{enter_macro_context} sets @code{mi_valid} to false, so if a macrowas expanded whilst parsing any part of the expression, then thetop-of-file test in @code{push_conditional} fails and the optimizationis turned off.@node Files@unnumbered File Handling@cindex filesFairly obviously, the file handling code of cpplib resides in the file@file{files.c}. It takes care of the details of file searching,opening, reading and caching, for both the main source file and all theheaders it recursively includes.The basic strategy is to minimize the number of system calls. On manysystems, the basic @code{open ()} and @code{fstat ()} system calls canbe quite expensive. For every @code{#include}-d file, we need to tryall the directories in the search path until we find a match. Someprojects, such as glibc, pass twenty or thirty include paths on thecommand line, so this can rapidly become time consuming.For a header file we have not encountered before we have little choicebut to do this. However, it is often the case that the same headers arerepeatedly included, and in these cases we try to avoid repeating thefilesystem queries whilst searching for the correct file.For each file we try to open, we store the constructed path in a splaytree. This path first undergoes simplification by the function@code{_cpp_simplify_pathname}. For example,@file{/usr/include/bits/../foo.h} is simplified to@file{/usr/include/foo.h} before we enter it in the splay tree and tryto @code{open ()} the file. CPP will then find subsequent uses of@file{foo.h}, even as @file{/usr/include/foo.h}, in the splay tree andsave system calls.Further, it is likely the file contents have also been cached, saving a@code{read ()} system call. We don't bother caching the contents ofheader files that are re-inclusion protected, and whose re-inclusionmacro is defined when we leave the header file for the first time. Ifthe host supports it, we try to map suitably large files into memory,rather than reading them in directly.The include paths are internally stored on a null-terminatedsingly-linked list, starting with the @code{"header.h"} directory searchchain, which then links into the @code{<header.h>} directory chain.Files included with the @code{<foo.h>} syntax start the lookup directlyin the second half of this chain. However, files included with the@code{"foo.h"} syntax start at the beginning of the chain, but with oneextra directory prepended. This is the directory of the current file;the one containing the @code{#include} directive. Prepending thisdirectory on a per-file basis is handled by the function@code{search_from}.Note that a header included with a directory component, such as@code{#include "mydir/foo.h"} and opened as@file{/usr/local/include/mydir/foo.h}, will have the complete path minusthe basename @samp{foo.h} as the current directory.Enough information is stored in the splay tree that CPP can immediatelytell whether it can skip the header file because of the multiple includeoptimization, whether the file didn't exist or couldn't be opened forsome reason, or whether the header was flagged not to be re-used, as itis with the obsolete @code{#import} directive.For the benefit of MS-DOS filesystems with an 8.3 filename limitation,CPP offers the ability to treat various include file names as aliasesfor the real header files with shorter names. The map from one to theother is found in a special file called @samp{header.gcc}, stored in thecommand line (or system) include directories to which the mappingapplies. This may be higher up the directory tree than the full path tothe file minus the base name.@node Concept Index@unnumbered Concept Index@printindex cp@bye
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -