htuml2txt.lex

来自「linux下阅读源码的好工具」· LEX 代码 · 共 105 行

LEX

105 行

/******** * $Id: htuml2txt.lex,v 1.1 2000/05/25 18:07:05 golda Exp $ * $Log: htuml2txt.lex,v $ * Revision 1.1  2000/05/25 18:07:05  golda * Added Christian's changes to allow dynamic filters.  I believe this has only been tested on Linux * systems.  --GV * * Revision 1.5  1999/11/06 21:25:07  cvogler * - Fixed bug that did not recognize the end of a comment correctly. * * Revision 1.4  1999/11/06 06:55:08  cvogler * - Added support for &gt; and &lt; (greather than, and less than). * -  Fixed problems with the matching rules for non-spacing tags that *    caused linefeeds to be incorrectly suppressed. As a result, jumping *    to line numbers from webglimpse searches did not work. * * * htuml2text.lex * * A faster HTML filter for WebGlimpse than htuml2txt.pl. I found that * the spawning of all the perl processes by glimpse was way too expensive * to be practical. In particular, searching 2000 files for a frequently * occuring term took more than 30 seconds on a PII-400/Linux 2.2.5 * machine. Rewriting the filter as a set of lex rules reduced the search * time to 5 seconds, which is on par with the simple html2txt filter. * * Suggested options for compiling on i386/Linux with egcs 1.1.2/flex 2.5.4: * flex -F -8 htuml2txt.lex * gcc -O3 -fomit-frame-pointer -o htuml2txt lex.yy.c -lfl * * Note:    For a smaller, slightly slower executable, omit the -F switch in *          the call to flex. * * Caution: The -8 switch MUST be specified if -f or -F is specified! *  * Note:    It is also necessary to edit .glimpse_filters in the *          WebGlimpse database directories. * * Suggested options for compiling with AT&T-style lex: * lex htuml2txt.lex * cc -O -o htuml2txt lex.yy.c -ll *  * Written  on 5/16/1999 by Christian Vogler * Send bugreports and suggestions to cvogler@gradient.cis.upenn.edu. ******/STRING           \"([^\"\n\\]|\\\")*\"WHITE            [\ \t]/* HTML tags that are to be eliminated altogether, without even a *//* substitution with a space */A                [aA]B                [bB]I                [iI]EM               [eE][mM]FONT             [fF][oO][nN][tT]STRONG           [sS][tT][rR][oO][nN][gG]BIG              [bB][iI][gG]SUP              [sS][uU][pP]SUB              [sS][uU][bB]U                [uU]STRIKE           [sS][tT][rR][iI][kK][eE]STYLE            [sS][tT][yY][lL][eE]NSPTAGS          ({A}|{B}|{I}|{EM}|{FONT}|{STRONG}|{BIG}|{SUP}|{SUB}|{U}|{STRIKE}|{STYLE})/* These allocate the necessary space to make AT&T lex work. *//* flex ignores them. */%e 4000%p 10000%n 2000/* treat inside of HTML comments and tags specially, to ensure that *//* everything inside them is eliminated, even if they contain quotes */%s COMMENT%s TAG%s BEGINTAG%%<COMMENT>[^\-\"\n\r]+                   {/* This ruleset eats up all */}<COMMENT>-+[^\-\>\"\n\r]+               {/* HTML comments */}<COMMENT>-\>                            {/* none */}<COMMENT>{STRING}                       {/* none */}<COMMENT>-{2,}\>                        BEGIN(INITIAL);<TAG>[^\"\>\r\n]+                       {/* This ruleset discards all */}<TAG>{STRING}                           {/* HTML tags */}<TAG>\>                                 BEGIN(INITIAL);<BEGINTAG>{WHITE}+                      {/* eat whitespace to find tag name */}<BEGINTAG>!--                           BEGIN(COMMENT); /* HTML comment */<BEGINTAG>\/                            {/* eat slash in tags */}<BEGINTAG>{NSPTAGS}                     BEGIN(TAG); /* tag to be eliminated altogether */<BEGINTAG>\>                            { fputc(' ', yyout); BEGIN(INITIAL);  /* whoa. Empty tag?!? Replace with space */ };<BEGINTAG>[A-Za-z0-9]+                  |<BEGINTAG>[^\r\n]                       { fputc(' ', yyout); BEGIN(TAG); /* all else is a tag to be replaced with a space */ }                    <INITIAL>\<                             BEGIN(BEGINTAG); /* tag that must be analyzed further (comment, spacing tag, non-spacing tag) */<INITIAL>&nbsp;                         fputc(' ', yyout); /* replace special */<INITIAL>&#161;                         fputc('

htuml2txt.lex - 源码说明

本页面展示了「linux下阅读源码的好工具」中的 htuml2txt.lex 源码文件，采用 LEX 编程语言编写，共 105 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?