htmlparser.txt

来自「基于minigui的浏览器. 这是最新版本.」· 文本代码 · 共 117 行

TXT

117 行

 October 2001, --Jcid Last update: Dec 2004                        ---------------                        THE HTML PARSER                        ---------------   Dillo's  parser is more than just a HTML parser, it does XHTMLand  plain  text  also.  It  has  parsing 'modes' that define itsbehaviour while working:   typedef enum {     DILLO_HTML_PARSE_MODE_INIT,     DILLO_HTML_PARSE_MODE_STASH,     DILLO_HTML_PARSE_MODE_STASH_AND_BODY,     DILLO_HTML_PARSE_MODE_BODY,     DILLO_HTML_PARSE_MODE_VERBATIM,     DILLO_HTML_PARSE_MODE_PRE   } DilloHtmlParseMode;   The  parser  works  upon a token-grained basis, i.e., the datastream is parsed into tokens and the parser is fed with them. Theprocess  is  simple:  whenever  the  cache  has new data, it getspassed to Html_write, which groups data into tokens and calls theappropriate functions for the token type (TAG, SPACE or WORD).   Note:   when  in  DILLO_HTML_PARSE_MODE_VERBATIM,  the  parserdoesn't  try  to  split  the  data stream into tokens anymore, itsimply collects until the closing tag.------TOKENS------  * A chunk of WHITE SPACE     --> Html_process_space  * TAG                        --> Html_process_tag    The tag-start is defined by two adjacent characters:      first : '<'      second: ALPHA | '/' | '!' | '?'      Note: comments are discarded ( <!-- ... --> )    The tag's end is not as easy to find, nor to deal with!:    1)  The  HTML  4.01  sec.  3.2.2 states that "Attribute/value    pairs appear before the final '>' of an element's start tag",    but it doesn't define how to discriminate the "final" '>'.    2) '<' and '>' should be escaped as '&lt;' and '&gt;' inside    attribute values.    3)  The XML SPEC for XHTML states:      AttrValue ::== '"' ([^<&"] | Reference)* '"'   |                     "'" ([^<&'] | Reference)* "'"    Current  parser  honors  the XML SPEC.    As  it's  a  common  mistake  for human authors to mistype or    forget  one  of  the  quote  marks of an attribute value; the    parser   solves  the  problem  with  a  look-ahead  technique    (otherwise  the  parser  could  skip significative amounts of    well written HTML).  * WORD                       --> Html_process_word    A  word is anything that doesn't start with SPACE, and that's    outside  of  a  tag, up to the first SPACE or tag start.    SPACE = ' ' | \n | \r | \t | \f | \v-----------------THE PARSING STACK -----------------  The parsing state of the document is kept in a stack:  struct _DilloHtml {     [...]     DilloHtmlState *stack;     gint stack_top; /* Index to the top of the stack [0 based] */     gint stack_max;     [...]  };  struct _DilloHtmlState {    char *tag;    DwStyle *style, *table_cell_style;    DilloHtmlParseMode parse_mode;    DilloHtmlTableMode table_mode;    gint list_level;    gint list_number;    DwWidget *page, *table;    gint32  current_bg_color;  };  Basically,  when a TAG is processed, a new state is pushed intothe  'stack'  and  its  'style'  is  set  to  reflect the desiredappearance (details in DwStyle.txt).  That way, when a word is processed later (added to the Dw), allthe information is within the top state.  Closing TAGs just pop the stack.

htmlparser.txt - 源码说明

本页面展示了「基于minigui的浏览器. 这是最新版本.」中的 htmlparser.txt 源码文件，采用文本编程语言编写，共 117 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与minigui相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?