📄 wget.pod

📁 一个从网络上自动下载文件的自由工具
💻 POD
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
another directory.Note that only at the end of the download can Wget know which links havebeen downloaded.  Because of that, the work done by B<-k> will beperformed at the end of all the downloads.=item B<-K>=item B<--backup-converted>When converting a file, back up the original version with a B<.orig>suffix.  Affects the behavior of B<-N>.=item B<-m>=item B<--mirror>Turn on options suitable for mirroring.  This option turns on recursionand time-stamping, sets infinite recursion depth and keeps FTPdirectory listings.  It is currently equivalent toB<-r -N -l inf --no-remove-listing>.=item B<-p>=item B<--page-requisites>This option causes Wget to download all the files that are necessary toproperly display a given HTML page.  This includes such things asinlined images, sounds, and referenced stylesheets.Ordinarily, when downloading a single HTML page, any requisite documentsthat may be needed to display it properly are not downloaded.  UsingB<-r> together with B<-l> can help, but since Wget does notordinarily distinguish between external and inlined documents, one isgenerally left with "leaf documents" that are missing theirrequisites.For instance, say document F<1.html> contains an C<E<lt>IMGE<gt>> tagreferencing F<1.gif> and an C<E<lt>AE<gt>> tag pointing to externaldocument F<2.html>.  Say that F<2.html> is similar but that itsimage is F<2.gif> and it links to F<3.html>.  Say thiscontinues up to some arbitrarily high number.If one executes the command:		wget -r -l 2 http://<site>/1.htmlthen F<1.html>, F<1.gif>, F<2.html>, F<2.gif>, andF<3.html> will be downloaded.  As you can see, F<3.html> iswithout its requisite F<3.gif> because Wget is simply counting thenumber of hops (up to 2) away from F<1.html> in order to determinewhere to stop the recursion.  However, with this command:		wget -r -l 2 -p http://<site>/1.htmlall the above files I<and> F<3.html>'s requisite F<3.gif>will be downloaded.  Similarly,		wget -r -l 1 -p http://<site>/1.htmlwill cause F<1.html>, F<1.gif>, F<2.html>, and F<2.gif>to be downloaded.  One might think that:		wget -r -l 0 -p http://<site>/1.htmlwould download just F<1.html> and F<1.gif>, but unfortunatelythis is not the case, because B<-l 0> is equivalent toB<-l inf>---that is, infinite recursion.  To download a single HTMLpage (or a handful of them, all specified on the command-line or in aB<-i> URL input file) and its (or their) requisites, simply leave offB<-r> and B<-l>:		wget -p http://<site>/1.htmlNote that Wget will behave as if B<-r> had been specified, but onlythat single page and its requisites will be downloaded.  Links from thatpage to external documents will not be followed.  Actually, to downloada single page and all its requisites (even if they exist on separatewebsites), and make sure the lot displays properly locally, this authorlikes to use a few options in addition to B<-p>:		wget -E -H -k -K -p http://<site>/<document>To finish off this topic, it's worth knowing that Wget's idea of anexternal document link is any URL specified in an C<E<lt>AE<gt>> tag, anC<E<lt>AREAE<gt>> tag, or a C<E<lt>LINKE<gt>> tag other than C<E<lt>LINKREL="stylesheet"E<gt>>.=item B<--strict-comments>Turn on strict parsing of HTML comments.  The default is to terminatecomments at the first occurrence of B<--E<gt>>.According to specifications, HTML comments are expressed as SGMLI<declarations>.  Declaration is special markup that begins withB<E<lt>!> and ends with B<E<gt>>, such as B<E<lt>!DOCTYPE ...E<gt>>, thatmay contain comments between a pair of B<--> delimiters.  HTMLcomments are "empty declarations", SGML declarations without anynon-comment text.  Therefore, B<E<lt>!--foo--E<gt>> is a valid comment, andso is B<E<lt>!--one-- --two--E<gt>>, but B<E<lt>!--1--2--E<gt>> is not.On the other hand, most HTML writers don't perceive comments as anythingother than text delimited with B<E<lt>!--> and B<--E<gt>>, which is notquite the same.  For example, something like B<E<lt>!------------E<gt>>works as a valid comment as long as the number of dashes is a multipleof four (!).  If not, the comment technically lasts until the nextB<-->, which may be at the other end of the document.  Because ofthis, many popular browsers completely ignore the specification andimplement what users have come to expect: comments delimited withB<E<lt>!--> and B<--E<gt>>.Until version 1.9, Wget interpreted comments strictly, which resulted inmissing links in many web pages that displayed fine in browsers, but hadthe misfortune of containing non-compliant comments.  Beginning withversion 1.9, Wget has joined the ranks of clients that implements"naive" comments, terminating each comment at the first occurrence ofB<--E<gt>>.If, for whatever reason, you want strict comment parsing, use thisoption to turn it on.=back=head2 Recursive Accept/Reject Options=over 4=item B<-A> I<acclist> B<--accept> I<acclist>=item B<-R> I<rejlist> B<--reject> I<rejlist>Specify comma-separated lists of file name suffixes or patterns toaccept or reject. Note that ifany of the wildcard characters, B<*>, B<?>, B<[> orB<]>, appear in an element of I<acclist> or I<rejlist>,it will be treated as a pattern, rather than a suffix.=item B<-D> I<domain-list>=item B<--domains=>I<domain-list>Set domains to be followed.  I<domain-list> is a comma-separated listof domains.  Note that it does I<not> turn on B<-H>.=item B<--exclude-domains> I<domain-list>Specify the domains that are I<not> to be followed..=item B<--follow-ftp>Follow FTP links from HTML documents.  Without this option,Wget will ignore all the FTP links.=item B<--follow-tags=>I<list>Wget has an internal table of HTML tag / attribute pairs that itconsiders when looking for linked documents during a recursiveretrieval.  If a user wants only a subset of those tags to beconsidered, however, he or she should be specify such tags in acomma-separated I<list> with this option.=item B<--ignore-tags=>I<list>This is the opposite of the B<--follow-tags> option.  To skipcertain HTML tags when recursively looking for documents to download,specify them in a comma-separated I<list>.  In the past, this option was the best bet for downloading a single pageand its requisites, using a command-line like:		wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>However, the author of this option came across a page with tags likeC<E<lt>LINK REL="home" HREF="/"E<gt>> and came to the realization thatspecifying tags to ignore was not enough.  One can't just tell Wget toignore C<E<lt>LINKE<gt>>, because then stylesheets will not be downloaded.Now the best bet for downloading a single page and its requisites is thededicated B<--page-requisites> option.=item B<--ignore-case>Ignore case when matching files and directories.  This influences thebehavior of -R, -A, -I, and -X options, as well as globbingimplemented when downloading from FTP sites.  For example, with thisoption, B<-A *.txt> will match B<file1.txt>, but alsoB<file2.TXT>, B<file3.TxT>, and so on.=item B<-H>=item B<--span-hosts>Enable spanning across hosts when doing recursive retrieving.=item B<-L>=item B<--relative>Follow relative links only.  Useful for retrieving a specific home pagewithout any distractions, not even those from the same hosts.=item B<-I> I<list>=item B<--include-directories=>I<list>Specify a comma-separated list of directories you wish to follow whendownloading.  Elementsof I<list> may contain wildcards.=item B<-X> I<list>=item B<--exclude-directories=>I<list>Specify a comma-separated list of directories you wish to exclude fromdownload.  Elements ofI<list> may contain wildcards.=item B<-np>=item B<--no-parent>Do not ever ascend to the parent directory when retrieving recursively.This is a useful option, since it guarantees that only the filesI<below> a certain hierarchy will be downloaded.=back=head1 FILES=over 4=item B</usr/local/etc/wgetrc>Default location of the I<global> startup file.=item B<.wgetrc>User startup file.=back=head1 BUGSYou are welcome to submit bug reports via the GNU Wget bug tracker (seeE<lt>B<http://wget.addictivecode.org/BugTracker>E<gt>).Before actually submitting a bug report, please try to follow a fewsimple guidelines.=over 4=item 1.Please try to ascertain that the behavior you see really is a bug.  IfWget crashes, it's a bug.  If Wget does not behave as documented,it's a bug.  If things work strange, but you are not sure about the waythey are supposed to work, it might well be a bug, but you might want todouble-check the documentation and the mailing lists.=item 2.Try to repeat the bug in as simple circumstances as possible.  E.g. ifWget crashes while downloading B<wget -rl0 -kKE -t5 --no-proxyhttp://yoyodyne.com -o /tmp/log>, you should try to see if the crash isrepeatable, and if will occur with a simpler set of options.  You mighteven try to start the download at the page where the crash occurred tosee if that page somehow triggered the crash.Also, while I will probably be interested to know the contents of yourF<.wgetrc> file, just dumping it into the debug message is probablya bad idea.  Instead, you should first try to see if the bug repeatswith F<.wgetrc> moved out of the way.  Only if it turns out thatF<.wgetrc> settings affect the bug, mail me the relevant parts ofthe file.=item 3.Please start Wget with B<-d> option and send us the resultingoutput (or relevant parts thereof).  If Wget was compiled withoutdebug support, recompile it---it is I<much> easier to trace bugswith debug support on.Note: please make sure to remove any potentially sensitive informationfrom the debug log before sending it to the bug address.  TheC<-d> won't go out of its way to collect sensitive information,but the log I<will> contain a fairly complete transcript of Wget'scommunication with the server, which may include passwords and piecesof downloaded data.  Since the bug address is publically archived, youmay assume that all bug reports are visible to the public.=item 4.If Wget has crashed, try to run it in a debugger, e.g. C<gdb `whichwget` core> and type C<where> to get the backtrace.  This may notwork if the system administrator has disabled core files, but it issafe to try.=back=head1 SEE ALSOThis is B<not> the complete manual for GNU Wget.For more complete information, including more detailed explanations ofsome of the options, and a number of commands availablefor use with F<.wgetrc> files and the B<-e> option, see the GNUInfo entry for F<wget>.=head1 AUTHOROriginally written by Hrvoje Niksic E<lt>hniksic@xemacs.orgE<gt>.Currently maintained by Micah Cowan E<lt>micah@cowan.nameE<gt>.=head1 COPYRIGHTCopyright (c) 1996--2008 Free Software Foundation, Inc.Permission is granted to make and distribute verbatim copies ofthis manual provided the copyright notice and this permission noticeare preserved on all copies.Permission is granted to copy, distribute and/or modify this documentunder the terms of the GNU Free Documentation License, Version 1.2 orany later version published by the Free Software Foundation; with noInvariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  Acopy of the license is included in the section entitled "GNU FreeDocumentation License".
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -