⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 html2text.1

📁 将HTML转换为TXT文件的程序
💻 1
字号:
' $Id: html2text.1,v 1.20 1999/12/20 19:40:36 arno Exp $.TH HTML2TEXT 1 "November 1999" "Version 1.2.1".SH NAMEhtml2text \- an advanced HTML\-to\-text converter.SH SYNOPSIS.B html2text -help.br.B html2text -version.br.B html2text[.B \-unparse|.B \-check] [.B \-debug\-scanner] [.B \-debug\-parser] [.B \-rcfile.I path] [.B \-style(.B compact|.B pretty)] [.B \-width.I width] [.B \-o.I output-file] [.B \-nobs] [.IR input-url " ..."].SH DESCRIPTION.B Html2textreads HTML 3.2 documents from the.IR input-url s,formats each into a stream of ASCII characters and writes the result tostandard output (or into.IR output-file ,if the.B -ocommand line option is used). It also accepts syntactically incorrect inputand attempts to interpret it "reasonably"..PPDocuments that are specified by a URL that begins with "http:" are retrievedwith the Hypertext Transfer Protocol. URLs that begin with "file:" and URLsthat do not contain a colon specify local files. All other URLs areinvalid..PPIf no.IR input-url sare specified on the command line,.B html2textreads from standard input. A dash as the.I input-urlis an alternate way to specify standard input..PPThe way that.B html2textformats the HTML documents is controlled by formatting properties read froman RC file..B Html2textattempts to read.B $HOME/.html2textrc(or the file specified by the.B -rcfilecommand line option); if that file cannot be read,.B html2textattempts to read.BR /etc/html2textrc .If no RC file can be read (or if the RC file does not override all formattingproperties), then "reasonable" defaults are assumed. The RC file format isdescribed in.IR html2textrc(4) ..SS OPTIONS.TP.B \-helpPrint command line summary..TP.B \-versionPrint program version..TP.B \-unparseInstead of formatting the parsed document, generate HTML code. (This generatedHTML code is always syntactically correct.) If.B html2texthas problems parsing a syntactically incorrect HTML document, this optionmay help you understand what.B html2textthinks that the original HTML code means..TP.B \-checkThe HTML document(s) is/are only parsed and not processed otherwise. In thismode of operation,.B html2textwill report on parse errors and scan errors, which it does not in other modesof operation. Notice that parse and scan errors are not fatal for.IR html2text ,but may cause mis-interpretation of the HTML code and/or portions of thedocument being swallowed..TP.B \-debug\-scannerWhile scanning the HTML document(s),.B html2textreports on each lexcial token scanned..TP.B \-debug\-parserWhile scanning the HTML document(s),.B html2textreports on the tokens being shifted, rules being applied, etc..TP.BI \-rcfile " path"Attempt to read.I pathinstead of.BR $HOME/.html2textrc ..TP.BR \-style " ( " compact " | " pretty " )"Style.B prettychanges some of the default values of the formatting parameters documented in.BR html2textrc(4) .To find out which and how the formatting parameter defaults are changed, check.BR pretty.style ..TP.BI \-width " width"By default,.B html2textformats the HTML document(s) for a screen width of 79 characters. If yourterminal has width other than 80 characters (or if you want to get an idea how.B html2textdeals with large tables and different terminal widths) you may want to specify a different.IR width ..TP.BI \-o " output\-file"Write the output to.I output\-fileinstead of standard output. A dash as the.I output\-fileis an alternate way to specify the standard output..TP.B \-nobsBy default,.B html2textrenders underlined letters with sequences like "underscore-backspace-character"and boldface letters like "character-backspace-character", which works finewhen the output is piped into.IR more (1),.IR less (1), or similar. For other applications, it may be desirable to not rendercharacter attributes with such backspace sequences, which can be specifiedwith this command line option..SH SEE ALSO.IR html2textrc (4),.IR more (1),.IR less (1).br.I http://www.w3c.orgfor the HTML 3.2 Reference Specification.br.I http://www.gmrs.defor updates and additional information.SH BUGS.B Html2textundergoes considerable effort to parse syntactically incorrect input, but isnot always as successful as other HTML processors. If you have the possibilityto correct the HTML source code, you may want to use the.B html2text \-unparseand.B html2text \-checkto find out what exactly.IR html2text 'sproblem is..PP.B Html2textunderstands all HTML 3.2 constructs, but can render only part of themdue to the limitations of the text output format. However,.B html2textattempts to provide good substitutes for the elements it cannot render, e.g.it prints.B [image.gif]at a position where.B "<IMAGE SRC=image.gif>"appears. However, there are still components and attributes that canpossibly be rendered in better ways.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -