⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 wget-pavuk.howto

📁 网络爬虫程序
💻 HOWTO
📖 第 1 页 / 共 2 页
字号:
|-----------------------------------------------------------------------|| * adjusting of local directory layout of downloaded files             || -nd, --no-directories            | -fnrules F "*" "%n"                || -x, --force-directories          | this is default for pavuk          || -nH, --no-host-directories       | -base_level 2                      ||                                  | or -fnrules F "*" "%d/%n"          || -P, --directory-prefix=PREFIX    | -cdir PREFIX                       || --cut-dirs=NUMBER                | -base_level NUMBER+1               ||-----------------------------------------------------------------------|| * authorization for accessing documents                               || --http-user=USER                 | -auth_name USER                    || --http-passwd=PASS               | -auth_passwd PASS                  ||-----------------------------------------------------------------------|| * caching of documents on proxy server                                || -C, --cache=on/off               | -cache -nocache                    ||-----------------------------------------------------------------------|| * workaround for buggy HTTP servers with bad Content-Length header    || --ignore-length                  | -check_size/-nocheck_size          ||-----------------------------------------------------------------------|| * adding custom fields to HTTP request header                         || --header=STRING                  | -httpad STRING                     ||-----------------------------------------------------------------------|| * proxy authorization                                                 || --proxy-user=USER                | -http_proxy_user USER              || --proxy-passwd=PASS              | -http_proxy_pass PASS              ||-----------------------------------------------------------------------|| * storing HTTP response headers                                       || -s, --save-headers               | pavuk never stores HTTP responses  ||                                  | within documents, it provides      ||                                  | -store_info, to store this         ||                                  | information in sepearate files in  ||                                  | .pavuk_info directory              ||-----------------------------------------------------------------------|| * spoofing of User-Agent: request field                               || -U, --user-agent=AGENT           | -identity AGENT                    ||-----------------------------------------------------------------------|| * preserving of symbolic links when transfering through FTP           || --retr-symlinks                  | -preserve_slinks/-nopreserve_slinks||-----------------------------------------------------------------------|| * globbing in FTP transfers                                           || -g, --glob=on/off                | no exactly similar functionality   ||                                  | pavuk allow to specify wildcard    ||                                  | patterns and regular expressions   ||                                  | with options                       ||                                  | -pattern, -rpattern, -skip_pattern ||                                  | -skip_rpattern, -url_patter,       ||                                  | -url_rpattern, -skip_url_pattern   ||                                  | -skip_url_rpatter                  ||                                  | this patterns arre applied on all  ||                                  | URLs not just FTP URLs             ||-----------------------------------------------------------------------|| * setting of FTP data connection type                                 || --passive-ftp                    | -ftp_active/-ftp_passive           ||-----------------------------------------------------------------------|| * recursing through WWW                                               || -r, --recursive                  | this is default pavuk behaviour    ||-----------------------------------------------------------------------|| * limiting recursion level                                            || -l, --level=NUMBER               | -lmax NUMBER                       || 0 unlimited                      | 0 unlimited                        ||-----------------------------------------------------------------------|| * prefetching files to proxy                                          || --delete-after                   | -mode dontstore                    ||-----------------------------------------------------------------------|| * converting links in HTML documents                                  || -k, --convert-links              | default pavuk behaviour, it is     ||                                  | possible to set how to convert the ||                                  | as default pavuk maintain          ||                                  | consistency of links inside HTML   ||                                  | documents, so all links are always ||                                  | valid and when documents is stored ||                                  | to local tree, pavuk overwrites    ||                                  | links in all HTML documents which  ||                                  | points to it.                      ||                                  | default behaviour is possible to   ||                                  | change with options:               ||                                  | -all_to_local, -sel_to_local,      ||                                  | -all_to_remote                     ||-----------------------------------------------------------------------|| * mirroring                                                           || -m, --mirror                     | -mode sync                         ||-----------------------------------------------------------------------|| * directory listings handling                                         || -nr, --dont-remove-listing       | -store_index/-nostore_index        ||                                  | pavuk as default converts all      ||                                  | directory listing to HTML files and||                                  | as default are this files stored.  ||-----------------------------------------------------------------------|| * allowing/disallowing of specified suffixes (file extensions)        || -A, --accept=LIST                | -asfx LIST                         || -R, --reject=LIST                | -dsfx LIST                         ||-----------------------------------------------------------------------|| * allowing/disallowing of specified domains                           || -D, --domains=LIST               | -adomain LIST                      || --exclude-domains=LIST           | -ddomain LIST                      ||-----------------------------------------------------------------------|| * following of relative only URLs                                     || -L, --relative                   | no similar option                  ||-----------------------------------------------------------------------|| * enabling of transfers from FTP servers                              || --follow-ftp                     | -FTP/-noFTP                        ||-----------------------------------------------------------------------|| * spaning to other servers                                            || -H, --span-hosts                 | this is default pavuk behaviour    ||                                  | it is possible to change it with   ||                                  | -dont_leave_site/-leave_site       ||-----------------------------------------------------------------------|| * allowing/disallowing of specified directories (prefixes)            || -I, --include-directories=LIST   | -aprefix LIST                      || -X, --exclude-directories=LIST   | -dprefix LIST                      ||-----------------------------------------------------------------------|| * DNS lookup                                                          || -nh, --no-host-lookup            | no similar option                  ||                                  | pavuk always caches DNS request in ||                                  | local hash table                   ||-----------------------------------------------------------------------|| * disable ascending to parent directories                             || -np, --no-parent                 | -dont_leave_dir/-leave_dir         ||-----------------------------------------------------------------------|This document I wrote looking at "wget --help" output and sometimes checking info documentation of wget. The version of wget I have is "GNU Wget 1.5.3"as distributed with RedHat 6.2 .If you find any bugs/mistakes/typos/... please tell me and I will try to correct it.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -