⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 htuml2txt.lex

📁 linux下阅读源码的好工具
💻 LEX
字号:
/******** * $Id: htuml2txt.lex,v 1.1 2000/05/25 18:07:05 golda Exp $ * $Log: htuml2txt.lex,v $ * Revision 1.1  2000/05/25 18:07:05  golda * Added Christian's changes to allow dynamic filters.  I believe this has only been tested on Linux * systems.  --GV * * Revision 1.5  1999/11/06 21:25:07  cvogler * - Fixed bug that did not recognize the end of a comment correctly. * * Revision 1.4  1999/11/06 06:55:08  cvogler * - Added support for &gt; and &lt; (greather than, and less than). * -  Fixed problems with the matching rules for non-spacing tags that *    caused linefeeds to be incorrectly suppressed. As a result, jumping *    to line numbers from webglimpse searches did not work. * * * htuml2text.lex * * A faster HTML filter for WebGlimpse than htuml2txt.pl. I found that * the spawning of all the perl processes by glimpse was way too expensive * to be practical. In particular, searching 2000 files for a frequently * occuring term took more than 30 seconds on a PII-400/Linux 2.2.5 * machine. Rewriting the filter as a set of lex rules reduced the search * time to 5 seconds, which is on par with the simple html2txt filter. * * Suggested options for compiling on i386/Linux with egcs 1.1.2/flex 2.5.4: * flex -F -8 htuml2txt.lex * gcc -O3 -fomit-frame-pointer -o htuml2txt lex.yy.c -lfl * * Note:    For a smaller, slightly slower executable, omit the -F switch in *          the call to flex. * * Caution: The -8 switch MUST be specified if -f or -F is specified! *  * Note:    It is also necessary to edit .glimpse_filters in the *          WebGlimpse database directories. * * Suggested options for compiling with AT&T-style lex: * lex htuml2txt.lex * cc -O -o htuml2txt lex.yy.c -ll *  * Written  on 5/16/1999 by Christian Vogler * Send bugreports and suggestions to cvogler@gradient.cis.upenn.edu. ******/STRING           \"([^\"\n\\]|\\\")*\"WHITE            [\ \t]/* HTML tags that are to be eliminated altogether, without even a *//* substitution with a space */A                [aA]B                [bB]I                [iI]EM               [eE][mM]FONT             [fF][oO][nN][tT]STRONG           [sS][tT][rR][oO][nN][gG]BIG              [bB][iI][gG]SUP              [sS][uU][pP]SUB              [sS][uU][bB]U                [uU]STRIKE           [sS][tT][rR][iI][kK][eE]STYLE            [sS][tT][yY][lL][eE]NSPTAGS          ({A}|{B}|{I}|{EM}|{FONT}|{STRONG}|{BIG}|{SUP}|{SUB}|{U}|{STRIKE}|{STYLE})/* These allocate the necessary space to make AT&T lex work. *//* flex ignores them. */%e 4000%p 10000%n 2000/* treat inside of HTML comments and tags specially, to ensure that *//* everything inside them is eliminated, even if they contain quotes */%s COMMENT%s TAG%s BEGINTAG%%<COMMENT>[^\-\"\n\r]+                   {/* This ruleset eats up all */}<COMMENT>-+[^\-\>\"\n\r]+               {/* HTML comments */}<COMMENT>-\>                            {/* none */}<COMMENT>{STRING}                       {/* none */}<COMMENT>-{2,}\>                        BEGIN(INITIAL);<TAG>[^\"\>\r\n]+                       {/* This ruleset discards all */}<TAG>{STRING}                           {/* HTML tags */}<TAG>\>                                 BEGIN(INITIAL);<BEGINTAG>{WHITE}+                      {/* eat whitespace to find tag name */}<BEGINTAG>!--                           BEGIN(COMMENT); /* HTML comment */<BEGINTAG>\/                            {/* eat slash in tags */}<BEGINTAG>{NSPTAGS}                     BEGIN(TAG); /* tag to be eliminated altogether */<BEGINTAG>\>                            { fputc(' ', yyout); BEGIN(INITIAL);  /* whoa. Empty tag?!? Replace with space */ };<BEGINTAG>[A-Za-z0-9]+                  |<BEGINTAG>[^\r\n]                       { fputc(' ', yyout); BEGIN(TAG); /* all else is a tag to be replaced with a space */ }                    <INITIAL>\<                             BEGIN(BEGINTAG); /* tag that must be analyzed further (comment, spacing tag, non-spacing tag) */<INITIAL>&nbsp;                         fputc(' ', yyout); /* replace special */<INITIAL>&#161;                         fputc('

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -