📄 ffe.texi
字号:
Just how this program will be run---whether automatically forold source (perhaps as the default for @file{.f} files?)---is notyet determined.In the meantime, it might as well be implemented as a typical UNIX pipe.It should accept a @samp{-fline-length-@var{n}} option,with the default line length set to 72.When the text it strips off the end of a line is not blank(not spaces and tabs),it should insert an additional comment line(beginning with @samp{!},so it works for both fixed-form and free-form files)containing the text,following the stripped line.The inserted comment should have a prefix of some kind,TBD, that distinguishes the comment as representing stripped text.Users could use that to @code{sed} out such lines, if they wished---itseems silly to provide a command-line option to delete informationwhen it can be so easily filtered out by another program.(This inserted comment should be designed to ``fit in'' wellwith whatever the Fortran community is using these days forpreprocessor, translator, and other such products, like OpenMP.What that's all about, and how @code{g77} can elegantly fit itsspecial comment conventions into it all, is TBD as well.We don't want to reinvent the wheel here, but if there turn outto be too many conflicting conventions, we might have to inventone that looks nothing like the others, but which offers theirhost products a better infrastructure in which to fit and coexistpeacefully.)@code{g77stripcard} probably shouldn't do any tab expansion or otherfancy stuff.People can use @code{expand} or other pre-filtering if they like.The idea here is to keep each stage quite simple, while providingexcellent performance for ``normal'' code.(Code with junk beyond column 73 is not really ``normal'',as it comes from a card-punch heritage,and will be increasingly hard for tomorrow's Fortran programmers to read.)@node lex.c@subsection lex.cTo help make the lexer simple, fast, and easy to maintain,while also having @code{g77} generally encourage Fortran programmersto write simple, maintainable, portable code by maximizing theperformance of compiling that kind of code:@itemize @bullet@itemThere'll be just one lexer, for both fixed-form and free-form source.@itemIt'll care about the form only when handling the first 7 columns oftext, stuff like spaces between strings of alphanumerics, andhow lines are continued.Some other distinctions will be handled by subsequent phases,so at least one of them will have to know which form is involved.For example, @samp{I = 2 . 4} is acceptable in fixed form,and works in free form as well given the implementation @code{g77}presently uses.But the standard requires a diagnostic for it in free form,so the parser has to be able to recognize thatthe lexemes aren't contiguous(information the lexer @emph{does} have to provide)and that free-form source is being parsed,so it can provide the diagnostic.The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.Otherwise, it'd have to know a whole lot more about how to parse Fortran,or subsequent phases (mainly parsing) would have two paths throughlots of critical code---one to handle the lexeme @samp{2}, @samp{.},and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.@itemIt won't worry about line lengths(beyond the first 7 columns for fixed-form source).That is, once it starts parsing the ``statement'' part of a line(column 7 for fixed-form, column 1 for free-form),it'll keep going until it finds a newline,rather than ignoring everything past a particular column(72 or 132).The implication here is that there shouldn't @emph{be}anything past that last column, other than whitespace orcommentary, because users using typical editors(or viewing output as typically printed)won't necessarily know just where the last column is.Code that has ``garbage'' beyond the last column(almost certainly only fixed-form code with a punched-card legacy,such as code using columns 73-80 for ``sequence numbers'')will have to be run through @code{g77stripcard} first.Also, keeping track of the maximum column position while also watching outfor the end of a line @emph{and} while reading from a filejust makes things slower.Since a file must be read, and watching for the end of the lineis necessary (unless the typical input file was preprocessed toinclude the necessary number of trailing spaces),dropping the tracking of the maximum column positionis the only way to reduce the complexity of the pertinent codewhile maintaining high performance.@itemASCII encoding is assumed for the input file.Code written in other character sets will have to be converted first.@itemTabs (ASCII code 9)will be converted to spaces via the straightforwardapproach.Specifically, a tab is converted to between one and eight spacesas necessary to reach column @var{n},where dividing @samp{(@var{n} - 1)} by eightresults in a remainder of zero.@itemLinefeeds (ASCII code 10)mark the ends of lines.@itemA carriage return (ASCII code 13)is accept if it immediately precedes a linefeed,in which case it is ignored.Otherwise, it is rejected (with a diagnostic).@itemAny other characters other than the abovethat are not part of the GNU Fortran Character Set(@pxref{Character Set})are rejected with a diagnostic.This includes backspaces, form feeds, and the like.(It might make sense to allow a form feed in column 1as long as that's the only character on a line.It certainly wouldn't seem to cost much in terms of performance.)@itemThe end of the input stream (EOF)ends the current line.@itemThe distinction between uppercase and lowercase letterswill be preserved.It will be up to subsequent phases to decide to fold case.Current plans are to permit any casing for Fortran (reserved) keywordswhile preserving casing for user-defined names.(This might not be made the default for @file{.f} files, though.)Preserving case seems necessary to provide more direct accessto facilities outside of @code{g77}, such as to C or Pascal code.Names of intrinsics will probably be matchable in any case,However, there probably won't be any option to requirea particular mixed-case appearance of intrinsics(as there was for @code{g77} prior to version 0.6),because that's painful to maintain,and probably nobody uses it.(How @samp{external SiN; r = sin(x)} would be handled is TBD.I think old @code{g77} might already handle that pretty elegantly,but whether we can cope with allowing the same fragment to referencea @emph{different} procedure, even with the same interface,via @samp{s = SiN(r)}, needs to be determined.If it can't, we need to make sure that when code introducesa user-defined name, any intrinsic matching that nameusing a case-insensitive comparisonis ``turned off''.)@itemBackslashes in @code{CHARACTER} and Hollerith constantsare not allowed.This avoids the confusion introduced by some Fortran compiler vendorsproviding C-like interpretation of backslashes,while others provide straight-through interpretation.Some kind of lexical construct (TBD) will be provided to allowflagging of a @code{CHARACTER}(but probably not a Hollerith)constant that permits backslashes.It'll necessarily be a prefix, such as:@smallexamplePRINT *, C'This line has a backspace \b here.'PRINT *, F'This line has a straight backslash \ here.'@end smallexampleFurther, command-line options might be provided to specify thatone prefix or the other is to be assumed as the defaultfor @code{CHARACTER} constants.However, it seems more helpful for @code{g77} to provide a programthat converts prefix all constants(or just those containing backslashes)with the desired designation,so printouts of code can be readwithout knowing the compile-time options used when compiling it.If such a program is provided(let's name it @code{g77slash} for now),then a command-line option to @code{g77} should not be provided.(Though, given that it'll be easy to implement, it might be hardto resist user requests for it ``to compile faster than if wehave to invoke another filter''.)This program would take a command-line option to specify thedefault interpretation of slashes,affecting which prefix it uses for constants.@code{g77slash} probably should automatically convert Hollerithconstants that contain slashesto the appropriate @code{CHARACTER} constants.Then @code{g77} wouldn't have to define a prefix syntax for Hollerithconstants specifying whether they want C-style or straight-throughbackslashes.@end itemizeThe above implements nearly exactly what is specified by@ref{Character Set},and@ref{Lines},except it also provides automatic conversion of tabsand ignoring of newline-related carriage returns.It also effects the ``pure visual'' model,by which is meant that a user viewing his codein a typical text editor(assuming it's not preprocessed via @code{g77stripcard} or similar)doesn't need any special knowledgeof whether spaces on the screen are really tabs,whether lines end immediately after the last visible non-space characteror after a number of spaces and tabs that follow it,or whether the last line in the file is ended by a newline.Most editors don't make these distinctions,the ANSI FORTRAN 77 standard doesn't require them to,and it permits a standard-conforming compilerto define a method for transforming source code to``standard form'' however it wants.So, GNU Fortran defines it such that users have the best chanceof having the code be interpreted the way it looks on the screenof the typical editor.(Fancy editors should @emph{never} be required to correctly read codewritten in classic two-dimensional-plaintext form.By correct reading I mean ability to read it, book-like, withoutmistaking text ignored by the compiler for program code and vice versa,and without having to count beyond the first several columns.The vague meaning of ASCII TAB, among other things, complicatesthis somewhat, but as long as ``everyone'', including the editor,other tools, and printer, agrees about the every-eighth-column convention,the GNU Fortran ``pure visual'' model meets these requirements.Any language or user-visible source formrequiring special tagging of tabs,the ends of lines after spaces/tabs,and so on, is broken by this definition.Fortunately, Fortran @emph{itself} is not broken,even if most vendor-supplied defaults for their Fortran compilers @emph{are}in this regard.)Further, this model provides a clean interfaceto whatever preprocessors or code-generators are usedto produce input to this phase of @code{g77}.Mainly, they need not worry about long lines.@node sta.c@subsection sta.c@node stb.c@subsection stb.c@node expr.c@subsection expr.c@node stc.c@subsection stc.c@node std.c@subsection std.c@node ste.c@subsection ste.c@node Gotchas (Transforming)@subsection Gotchas (Transforming)This section is not about transforming ``gotchas'' into something else.It is about the weirder aspects of transforming Fortran,however that's defined,into a more modern, canonical form.@subsubsection Multi-character LexemesEach lexeme carries with it a pointer to where it appears in the source.To provide the ability for diagnostics to point to column numbers,in addition to line numbers and names,lexemes that represent more than one (significant) characterin the source code need, generally,to provide pointers to where each @emph{character} appears in the source.This provides the ability to properly identify the precise locationof the problem in code like@smallexampleSUBROUTINE XENDBLOCK DATA XEND@end smallexamplewhich, in fixed-form source, would result in single lexemesconsisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.(The problem is that @samp{X} is defined twice,so a pointer to the @samp{X} in the second definition,as well as a follow-up pointer to the corresponding pointer in the first,would be preferable to pointing to the beginnings of the statements.)This need also arises when parsing (and diagnosing) @code{FORMAT}statements.Further, it arises when diagnosing@code{FMT=} specifiers that contain constants(or partial constants, or even propagated constants!)in I/O statements, as in:@smallexamplePRINT '(I2, 3HAB)', J@end smallexample(A pointer to the beginning of the prematurely-terminated Hollerithconstant, and/or to the close parenthese, is preferable to a pointerto the open-parenthese or the apostrophe that precedes it.)Multi-character lexemes, which would seem to naturally includeat least digit strings, alphanumeric strings, @code{CHARACTER}constants, and Hollerith constants, therefore need to providelocation information on each character.(Maybe Hollerith constants don't, but it's unnecessary to except them.)The question then arises, what about @emph{other} multi-character lexemes,such as @samp{**} and @samp{//},and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?Turns out there's a need to identify the location of the second characterof these two-character lexemes.For example, in @samp{I(/J) = K}, the slash needs to be diagnosedas the problem, not the open parenthese.Similarly, it is preferable to diagnose the second slash in@samp{I = J // K} rather than the first, given the implicit typingrules, which would result in the compiler disallowing the attemptedconcatenation of two integers.(Though, since that's more of a semantic issue,it's not @emph{that} much preferable.)Even sequences that could be parsed as digit strings could use location info,for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.(This probably will be parsed as a character string,to be consistent with the parsing of @samp{Z'129A'}.)To avoid the hassle of recording the location of the second character,while also preserving the general rule that each significant characteris distinctly pointed to by the lexeme that contains it,it's best to simply not have any fixed-size lexemeslarger than one character.This new design is expected to make checking for two@samp{*} lexemes in a row much easier than the old design,so this is not much of a sacrifice.It probably makes the lexer much easier to implementthan it makes the parser harder.@subsubsection Space-padding LexemesCertain lexemes need to be padded with virtual spaces when theend of the line (or file) is encountered.This is necessary in fixed form, to handle lines that don'textend to column 72, assuming that's the line length in effect.@subsubsection Bizarre Free-form Hollerith ConstantsLast I checked, the Fortran 90 standard actually required the compilerto silently accept something like@smallexampleFORMAT ( 1 2 Htwelve chars )@end smallexample
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -