📄 wget.texi

📁 wget (command line browser) source code
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
@cindex spider@item --spiderWhen invoked with this option, Wget will behave as a Web @dfn{spider},which means that it will not download the pages, just check that theyare there.  For example, you can use Wget to check your bookmarks:@examplewget --spider --force-html -i bookmarks.html@end exampleThis feature needs much more work for Wget to get close to thefunctionality of real web spiders.@cindex timeout@item -T seconds@itemx --timeout=@var{seconds}Set the network timeout to @var{seconds} seconds.  This is equivalentto specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and@samp{--read-timeout}, all at the same time.Whenever Wget connects to or reads from a remote host, it checks for atimeout and aborts the operation if the time expires.  This preventsanomalous occurrences such as hanging reads or infinite connects.  Theonly timeout enabled by default is a 900-second timeout for reading.Setting timeout to 0 disables checking for timeouts.Unless you know what you are doing, it is best not to set any of thetimeout-related options.@cindex DNS timeout@cindex timeout, DNS@item --dns-timeout=@var{seconds}Set the DNS lookup timeout to @var{seconds} seconds.  DNS lookups thatdon't complete within the specified time will fail.  By default, thereis no timeout on DNS lookups, other than that implemented by systemlibraries.@cindex connect timeout@cindex timeout, connect@item --connect-timeout=@var{seconds}Set the connect timeout to @var{seconds} seconds.  TCP connections thattake longer to establish will be aborted.  By default, there is noconnect timeout, other than that implemented by system libraries.@cindex read timeout@cindex timeout, read@item --read-timeout=@var{seconds}Set the read (and write) timeout to @var{seconds} seconds.  Reads thattake longer will fail.  The default value for read timeout is 900seconds.@cindex bandwidth, limit@cindex rate, limit@cindex limit bandwidth@item --limit-rate=@var{amount}Limit the download speed to @var{amount} bytes per second.  Amount maybe expressed in bytes, kilobytes with the @samp{k} suffix, or megabyteswith the @samp{m} suffix.  For example, @samp{--limit-rate=20k} willlimit the retrieval rate to 20KB/s.  This kind of thing is useful when,for whatever reason, you don't want Wget to consume the entire availablebandwidth.Note that Wget implements the limiting by sleeping the appropriateamount of time after a network read that took less time than specifiedby the rate.  Eventually this strategy causes the TCP transfer to slowdown to approximately the specified rate.  However, it may take sometime for this balance to be achieved, so don't be surprised if limitingthe rate doesn't work well with very small files.@cindex pause@cindex wait@item -w @var{seconds}@itemx --wait=@var{seconds}Wait the specified number of seconds between the retrievals.  Use ofthis option is recommended, as it lightens the server load by making therequests less frequent.  Instead of in seconds, the time can bespecified in minutes using the @code{m} suffix, in hours using @code{h}suffix, or in days using @code{d} suffix.Specifying a large value for this option is useful if the network or thedestination host is down, so that Wget can wait long enough toreasonably expect the network error to be fixed before the retry.@cindex retries, waiting between@cindex waiting between retries@item --waitretry=@var{seconds}If you don't want Wget to wait between @emph{every} retrieval, but onlybetween retries of failed downloads, you can use this option.  Wget willuse @dfn{linear backoff}, waiting 1 second after the first failure on agiven file, then waiting 2 seconds after the second failure on thatfile, up to the maximum number of @var{seconds} you specify.  Therefore,a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55seconds per file.Note that this option is turned on by default in the global@file{wgetrc} file.@cindex wait, random@cindex random wait@item --random-waitSome web sites may perform log analysis to identify retrieval programssuch as Wget by looking for statistically significant similarities inthe time between requests. This option causes the time between requeststo vary between 0 and 2 * @var{wait} seconds, where @var{wait} wasspecified using the @samp{--wait} option, in order to mask Wget'spresence from such analysis.A recent article in a publication devoted to development on a popularconsumer platform provided code to perform this analysis on the fly.Its author suggested blocking at the class C address level to ensureautomated retrieval programs were blocked despite changing DHCP-suppliedaddresses.The @samp{--random-wait} option was inspired by this ill-advisedrecommendation to block many unrelated users from a web site due to theactions of one.@cindex proxy@item -Y on/off@itemx --proxy=on/offTurn proxy support on or off.  The proxy is on by default if theappropriate environment variable is defined.For more information about the use of proxies with Wget, @xref{Proxies}.@cindex quota@item -Q @var{quota}@itemx --quota=@var{quota}Specify download quota for automatic retrievals.  The value can bespecified in bytes (default), kilobytes (with @samp{k} suffix), ormegabytes (with @samp{m} suffix).Note that quota will never affect downloading a single file.  So if youspecify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the@file{ls-lR.gz} will be downloaded.  The same goes even when several@sc{url}s are specified on the command-line.  However, quota isrespected when retrieving either recursively, or from an input file.Thus you may safely type @samp{wget -Q2m -i sites}---download will beaborted when the quota is exceeded.Setting quota to 0 or to @samp{inf} unlimits the download quota.@cindex DNS cache@cindex caching of DNS lookups@item --dns-cache=offTurn off caching of DNS lookups.  Normally, Wget remembers the addressesit looked up from DNS so it doesn't have to repeatedly contact the DNSserver for the same (typically small) set of addresses it retrievesfrom.  This cache exists in memory only; a new Wget run will contact DNSagain.However, in some cases it is not desirable to cache host names, even forthe duration of a short-running application like Wget.  For example,some HTTP servers are hosted on machines with dynamically allocated IPaddresses that change from time to time.  Their DNS entries are updatedalong with each change.  When Wget's download from such a host getsinterrupted by IP address change, Wget retries the download, but (due toDNS caching) it contacts the old address.  With the DNS cache turnedoff, Wget will repeat the DNS lookup for every connect and will thus getthe correct dynamic address every time---at the cost of additional DNSlookups where they're probably not needed.If you don't understand the above description, you probably won't needthis option.@cindex file names, restrict@cindex Windows file names@item --restrict-file-names=@var{mode}Change which characters found in remote URLs may show up in local filenames generated from those URLs.  Characters that are @dfn{restricted}by this option are escaped, i.e. replaced with @samp{%HH}, where@samp{HH} is the hexadecimal number that corresponds to the restrictedcharacter.By default, Wget escapes the characters that are not valid as part offile names on your operating system, as well as control characters thatare typically unprintable.  This option is useful for changing thesedefaults, either because you are downloading to a non-native partition,or because you want to disable escaping of the control characters.When mode is set to ``unix'', Wget escapes the character @samp{/} andthe control characters in the ranges 0--31 and 128--159.  This is thedefault on Unix-like OS'es.When mode is set to ``windows'', Wget escapes the characters @samp{\},@samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<},@samp{>}, and the control characters in the ranges 0--31 and 128--159.In addition to this, Wget in Windows mode uses @samp{+} instead of@samp{:} to separate host and port in local file names, and uses@samp{@@} instead of @samp{?} to separate the query portion of the filename from the rest.  Therefore, a URL that would be saved as@samp{www.xemacs.org:4300/search.pl?input=blah} in Unix mode would besaved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windowsmode.  This mode is the default on Windows.If you append @samp{,nocontrol} to the mode, as in@samp{unix,nocontrol}, escaping of the control characters is alsoswitched off.  You can use @samp{--restrict-file-names=nocontrol} toturn off escaping of control characters without affecting the choice ofthe OS to use as file name restriction mode.@end table@node Directory Options, HTTP Options, Download Options, Invoking@section Directory Options@table @samp@item -nd@itemx --no-directoriesDo not create a hierarchy of directories when retrieving recursively.With this option turned on, all files will get saved to the currentdirectory, without clobbering (if a name shows up more than once, thefilenames will get extensions @samp{.n}).@item -x@itemx --force-directoriesThe opposite of @samp{-nd}---create a hierarchy of directories, even ifone would not have been created otherwise.  E.g. @samp{wget -xhttp://fly.srk.fer.hr/robots.txt} will save the downloaded file to@file{fly.srk.fer.hr/robots.txt}.@item -nH@itemx --no-host-directoriesDisable generation of host-prefixed directories.  By default, invokingWget with @samp{-r http://fly.srk.fer.hr/} will create a structure ofdirectories beginning with @file{fly.srk.fer.hr/}.  This option disablessuch behavior.@cindex cut directories@item --cut-dirs=@var{number}Ignore @var{number} directory components.  This is useful for getting afine-grained control over the directory where recursive retrieval willbe saved.Take, for example, the directory at@samp{ftp://ftp.xemacs.org/pub/xemacs/}.  If you retrieve it with@samp{-r}, it will be saved locally under@file{ftp.xemacs.org/pub/xemacs/}.  While the @samp{-nH} option canremove the @file{ftp.xemacs.org/} part, you are still stuck with@file{pub/xemacs}.  This is where @samp{--cut-dirs} comes in handy; itmakes Wget not ``see'' @var{number} remote directory components.  Hereare several examples of how @samp{--cut-dirs} option works.@example@groupNo options        -> ftp.xemacs.org/pub/xemacs/-nH               -> pub/xemacs/-nH --cut-dirs=1  -> xemacs/-nH --cut-dirs=2  -> .--cut-dirs=1      -> ftp.xemacs.org/xemacs/...@end group@end exampleIf you just want to get rid of the directory structure, this option issimilar to a combination of @samp{-nd} and @samp{-P}.  However, unlike@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---forinstance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory willbe placed to @file{xemacs/beta}, as one would expect.@cindex directory prefix@item -P @var{prefix}@itemx --directory-prefix=@var{prefix}Set directory prefix to @var{prefix}.  The @dfn{directory prefix} is thedirectory where all other files and subdirectories will be saved to,i.e. the top of the retrieval tree.  The default is @samp{.} (thecurrent directory).@end table@node HTTP Options, FTP Options, Directory Options, Invoking@section HTTP Options@table @samp@cindex .html extension@item -E@itemx --html-extensionIf a file of type @samp{application/xhtml+xml} or @samp{text/html} is downloaded and the URL does not end with the regexp @samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html} to be appended to the local filename.  This is useful, for instance, when you're mirroring a remote site that uses @samp{.asp} pages, but you want the mirrored pages to be viewable on your stock Apache server.  Another good use for this is when you're downloading CGI-generated materials.  A URL like @samp{http://site.com/article.cgi?25} will be saved as@file{article.cgi?25.html}.Note that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the local@file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (sinceit doesn't yet know that the URL produces output of type@samp{text/html} or @samp{application/xhtml+xml}.  To prevent this re-downloading, you must use @samp{-k} and @samp{-K} so that the original version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive Retrieval Options}).@cindex http user@cindex http password@cindex authentication@item --http-user=@var{user}@itemx --http-passwd=@var{password}Specify the username @var{user} and password @var{password} on an@sc{http} server.  According to the type of the challenge, Wget willencode them using either the @code{basic} (insecure) or the@code{digest} authentication scheme.Another way to specify username and password is in the @sc{url} itself(@pxref{URL Format}).  Either method reveals your password to anyone whobothers to run @code{ps}.  To prevent the passwords from being seen,store them in @file{.wgetrc} or @file{.netrc}, and make sure to protectthose files from other users with @code{chmod}.  If the passwords arereally important, do not leave them lying in those files either---editthe files and delete them after Wget has started the download.For more information about security issues with Wget, @xref{SecurityConsiderations}.@cindex proxy@cindex cache@item -C on/off@itemx --cache=on/offWhen set to off, disable server-side cache.  In this case, Wget willsend the remote server an appropriate directive (@samp{Pragma:no-cache}) to get the file from the remote service, rather thanreturning the cached version.  This is especially useful for retrievingand flushing out-of-date documents on proxy servers.Caching is allowed by default.@cindex cookies@item --cookies=on/offWhen set to off, disable the use of cookies.  Cookies are a mechanismfor maintaining server-side state.  The server sends the client a cookieusing the @code{Set-Cookie} header, and the client responds with the
💿 文件大小 1292 K
👤 上传用户 xxjjyy1237
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#command #browser #source #wget
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -