readme

来自「著名的标准C＋＋的html解析器」· 代码 · 共 127 行

TXT

127 行

htmlcxx - html and css APIs for C++---------------------------------------------	Description	===========htmlcxx is a simple non-validating css1 and html parser for C++.Although there are several other html parsers available, htmlcxx has somecharacteristics that make it unique:- STL like navigation of DOM tree, using excelent's tree.hh library from  Kasper Peeters - It is possible to reproduce exactly, character by character, the  original document from the parse tree- Bundled css parser- Optional parsing of attributes	- C++ code that looks like C++ (not so true anymore)- Offsets of tags/elements in the original document are stored in the  nodes of the DOM treeThe parsing politics of htmlcxx were created trying to mimic mozillafirefox (http://www.mozilla.org) behavior. So you should expect parsetrees similar to those create by firefox. However, differently from firefox,htmlcxx does not insert non-existent stuff in your html. Therefore, serializingthe DOM tree gives exactly the same bytes contained in the original HTMLdocument.	News for version 0.7.3	======================Added utility code to escape/decode urls as defined by RFC 2396.Added new SAX interface. The API was slightly broken to support the newSAX interface :-(.Added Visual Studio 2003 projects for the WIN32 port.	Examples	========Using htmlcxx is quite simple. Take a lookat this example.-----------------------------------------------------------------------  #include <htmlcxx/html/ParserDom.h>  ...    //Parse some html code  string html = "<html><body>hey</body></html>";  HTML::ParserDom parser;  tree<HTML::Node> dom = parser.parseTree(html);    //Print whole DOM tree  cout << dom << endl;    //Dump all links in the tree  tree<HTML::Node>::iterator it = dom.begin();  tree<HTML::Node>::iterator end = dom.end();  for (; it != end; ++it)  {  	if (it->tagName() == "A")  	{  		it->parseAttributes();  		cout << it->attributes("href");  	}  }    //Dump all text of the document  it = dom.begin();  end = dom.end();  for (; it != end; ++it)  {  	if ((!it->isTag()) && (!it->isComment()))  	{  		cout << it->text();  	}  }-------------------------------------------------	The htmlcxx application	=======================htmlcxx is the name of both the library and the utilityapplication that comes with this package. Although the htmlcxx (the application) is mostly useless for programming, you can use it to easily see how htmlcxx (the library) would parse your html code.Just install and try htmlcxx -h.	Downloads	=========Use the project page at sourceforge: http://sf.net/projects/htmlcxx	License Stuff	=============Code is now under the LGPL. This was our initial intention, and isnow possible thanks to the author of tree.hh, who allowed us to use itunder LGPL only for HTML::Node template instances. Check http://www.fsf.org or the COPYING file in the distribution for detailsabout the LGPL license. The uri parsing code is a derivative work ofApache web server uri parsing routines. Check www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in thedistribution for details.----------------------------------------Enjoy!Davi de Castro Reis - <davi (a) users sf net>Robson Braga Ara鷍o - <braga (a) users sf net>Last Updated: Thu Mar 24 00:56:05 2005

readme - 源码说明

本页面展示了「著名的标准C＋＋的html解析器」中的 readme 源码文件，采用编程语言编写，共 127 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫开发者社区收录了大量与HTML解析相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?