📄 pavuk.1.in
字号:
.TP.I -send_if_range/-nosend_if_rangeSend \fBIf-Range:\fR header in HTTP request. I found out, that some HTTPservers (greetings, MS :-)) are sending different \fBETag:\fRfields in different responses for the same, unchangeddocument. This causes problems when pavuk attempts to reget adocument from such a server: pavuk will remember the old ETag value anduses it it following requests for this document.If the server checks it with the new ETag value and it differs,it will refuse to send only part of the document, and start the downloadfrom scratch..TP.I -ssl_version $vSet required SSL protocol version for SSL communication.\fB$v\fR is one of ssl2, ssl23, ssl3 or tls1.This option is available only when compiled with SSL support.Default is ssl23..TP.I -unique_sslid/-nounique_sslidThis option can be used if you want to use a unique \fBSSL ID\fR for allSSL sessions. Default pavuk behavior is to negotiate each time new sessionID for each connection.This option is available only when compiled with SSL support..TP.I -use_http11/-nouse_http11This option is used to switch between HTTP/1.0 and HTTP/1.1 protocolused with HTTP servers. Now is using of HTTP/1.1 protocol not defaultbecause its implementation is very fresh and not 100% tested. Even thoughusing of HTTP/1.1 is very recommended, because it is faster than HTTP/1.0and uses less network bandwidth for initiating connections. In any furtherversion I will activate using of HTTP/1.1 as default..TP.I -local_ip $addrYou can use this option when you want to use specified network interfacefor communication with other hosts. This option is suitable for multihomedhosts with several network interfaces. Address should be entered as regularIP address or as host name..TP.I -identity $strThis option allows you to specify content of \fBUser-Agent:\fR field of HTTP request.This is usable, when scripts on remote server returns different document on sameURL for different browsers, or if some HTTP server refuse to serve documentfor Web robots like pavuk. Default pavuk sends in \fBUser-Agent:\fR field\fBpavuk/$VERSION\fR string..TP.I -auto_referer/-noauto_refererThis option forces pavuk to send HTTP \fBReferer:\fR header field with starting URLs.Content of this field will be self URL. Using this option is required,when remote server checks the Referer: field.At default pavuk wont send Referer: field with starting URLs..TP.I -referer/-norefererThis option allows to enable and disable the transmission of HTTP \fBReferer:\fRheader field. At default pavuk sends Referer: field..TP.I -httpad $strIn some cases you may want to add user defined fields to HTTP/HTTPS requests.This option is exactly for this purpose. In \fB$str\fR you can directly specifycontent of additional header. If you specify only raw header, it will be usedonly for starting requests. When you want to use this header with each requestwhile crawling, prefix the header with \fB+\fR character..TP.I -del_after/-nodel_afterThis option allows you to delete FILES from REMOTE server, when download isproperly finished. At default is this option off..TP.I -FTPlist/-noFTPlistWhen option -FTPlist will be used, pavuk will retrieve content of FTPdirectories with FTP command \fBLIST\fR instead of \fBNLST\fR. So the same listing will beretrieved as with "ls -l" UNIX command.This option is required if you need to preserve permissions of remote files oryou need to preserve symbolic links.Pavuk supports wide listing on FTP servers with regular \fBBSD\fR or \fBSYSV\fR style "ls -l"directory listing, on FTP servers with \fBEPFL\fR listing format, \fBVMS\fR style listing,\fBDOS/Windows\fR style listing and \fBNovel\fR listing format.Default pavuk behavior is to use NLST fro FTP directory listings..TP.I -ftp_list_options $strSome FTP servers require to supply extra options to LIST or NLST FTP commandsto show all files and directories properly. But be sure not to use any extraoptions which can reformat output of the listing. Useful is especially \fB-a\fRoption which force FTP server to show also dot files and directories andwith broken WuFTP servers it also helps to produce full directory listingsnot just files..TP.I -fix_wuftpd/-nofix_wuftpdThis option is result of several attempts to to get working properly the\fB-remove_old\fR option with WuFTPd server when \fB-ftplist\fR option isused. The problem is that FTP command LIST on WuFTPd don't mind when tryingto list nonexisting directory, and indicates success in FTP response code.When you activate this option, pavuk uses extra FTP command (STAT -d dir)to check whether the directory really exists. Don't use this option untilyou are sure that you really need it!.SH Authentification.sp.TP.I -auth_file $fileFile where you have stored authentification information for accessto some service. For file structure see below in \fBFILES\fR section..TP.I -auth_name $userIf you are using this parameter, program is doing authentification with each HTTPaccess to document. Use this only if you know that only one HTTP server could beaccessed or use \fB-asite\fR option to specify site to which you useauthentification. Else your auth parameters will be sent to each accessedHTTP server..TP.I -auth_passwd $passwdValue of this parameter is used as password for authentification.TP.I -auth_scheme {1/2/3/4/user/Basic/Digest/NTLM}This parameter specifies used authentification scheme..br.B 1 or usermeans.B userauthentification scheme is used as defined in HTTP/1.0 or HTTP/1.1.Password and user name are sent unencoded..br.B 2 or Basicmeans.B Basicauthentification scheme is used as defined in HTTP/1.0.Password and user name are sent BASE64 encoded..br.B 3 or Digestmeans.B Digestaccess authentification scheme based on MD5 checksums as defined in RFC2069..br.B 4 or NTLMmeans.B NTLMproprietary access authentification scheme used by Microsoft IIS or Proxy servers.When you use this scheme, you must also specify NT or LM domain with option \fB-auth_ntlm_domain\fR. This scheme is supported only when compiled with OpenSSL or libdes libraries..TP.I -auth_ntlm_domain $strNT or LM domain used for authorization again HTTP server when NTLM authentification scheme is required. This option is available only when compiled with OpenSSL or libdes libraries..TP.I -auth_reuse_nonce/-noauth_reuse_nonceWhile using HTTP Digest access authentification scheme use first received noncevalue in more following requests.Default pavuk negotiates nonce for each request..TP.I -ssl_key_file $fileFile with public key for SSL certificate (learn more from SSLeay or OpenSSL documentation)This option is available only when compiled with SSL support (you need SSleay or OpenSSL libraries and development headers).TP.I -ssl_cert_file $fileCertificate file in PEM format (learn more from SSLeay or OpenSSL documentation)This option is available only when compiled with SSL support (you need SSleay or OpenSSL libraries and development headers).TP.I -ssl_cer_passwd $strPassword used to generate certificate (learn more from SSLeay or OpenSSL documentation)This option is available only when compiled with SSL support (you need SSLeay or OpenSSL libraries and development headers).TP.I -nss_cert_dir $dirConfig directory for NSS (Netscape SSL implementation) certificates. Usually~/.netscape (created by Netscape communicator/navigator) or profile directorybelow ~/.mozilla (created by Mozilla browser). The directory should contain\fBcert7.db\fR and \fBkey3.db\fR files. If you don't use Mozilla norNetscape, you must create this files by utilities distributed with NSSlibraries. Pavuk opens certificate database only readonly.This option is available only when pavuk is compiled with SSL supportprovided by Netscape NSS SSL implementation..TP.I [-nss_accept_unknown_cert/-nonss_accept_unknown_cert]By default will pavuk reject connection to SSL server which certificate is notstored in local certificate database (set by \fB-nss_cert_dir\fR option).You must explicitly force pavuk to allow connection to servers with unknowncertificates.This option is available only when pavuk is compiled with SSL supportprovided by Netscape NSS SSL implementation..TP.I [-nss_domestic_policy/-nss_export_policy]Selects sets of ciphers allowed/disabled by USA export rules.This option is available only when pavuk is compiled with SSL supportprovided by Netscape NSS SSL implementation..TP.I -from $emailThis parameter is used when accessing anonymous FTP server as password or isoptionally inserted into \fBFrom\fR field in HTTP request. If not specifiedpavuk discovers this from \fBUSER\fR environment variable and from sitehostname..TP.I -send_from/-nosend_fromThis option is used for enabling or disabling sending of user identification,entered in \fB-from\fRoption, as FTP anonymous user password and \fBFrom:\fR field of HTTP request.As default is this option off..TP.I -ftp_login_handshake $host $handshakeWhen you need to use nonstandard login procedure for some of FTP servers,you can use this option to change default pavuk login procedure. To allowmore flexibility, you can assign the login procedure to some server or toall. When \fI$host\fR is specified as empty string (\fI""\fR), than attachedlogin procedure is assigned to all FTP servers besides those having assignedown login procedures. In the \fI$handshake\fR parameter you can specify exactlogin procedure specified by FTP commands followed by expected FTP responsecodes delimited with backslash (\fI\\\fR) characters..brFor example this isdefault login procedure when logging in regular ftp server without goingthrough proxy server : \fIUSER %u\\331\\PASS %p\\230\fR. There are twocommands followed by two response codes. After USER command pavuk expectsFTP response code 331 and after PASS command pavuk expects from server FTPresponse code 230. In ftp commands you can use following macros which will bereplaced by respective values:.sp.br \fI%u\fR - user name used to access FTP server.br \fI%p\fR - password used to access FTP server.br \fI%U\fR - user name used to access FTP proxy server.br \fI%P\fR - password used to access FTP proxy server.br \fI%h\fR - hostname of FTP server.br \fI%s\fR - port number on which FTP server listens.SH Site/Domain/Port Limitation Options.sp.TP.I -asite $listSpecify comma separated list of allowed sites on which referenced documentsare stored..TP.I -dsite $listSpecify comma separated list of disallowed sites.Previous parameter isopposite to this one. If both are used the last occurrence of them is used tobe valid..TP.I -adomain $listSpecify comma separated list of allowed domains on which referenced documentsare stored..TP.I -ddomain $listSpecify comma separated list of disallowed domains. Previous parameter isopposite to this one. If both are used the last occurrence of them is used tobe valid..TP.I -aport $listIn \fB$list\fR, you can write comma separated list of ports from which youallow to download documents..TP.I -dport $listThis option is opposite option to previous option. It is used to specifydenied ports. If both \fB-aport\fR and \fB-dport\fR options are used thelast occurrence of them is used to be valid and all other occurrences willbe omitted..SH Limitation Document properties.sp.TP.I -amimet $listList of comma separated allowed MIME types. You can use with this option also wildcard patterns..TP.I -dmimet $listList of comma separated disallowed MIME types. You can use with this option also wildcard patterns.Previous parameter is opposite to this one. If both are used the last occurrence of them is used to be valid..TP.I -maxsize $nrMaximum allowed size of document.This option is applied only when pavuk is able to detect the document before starting the transfer.Default value is 0, and it means this limit isn't applied..TP.I -minsize $nrminimal allowed size of document.This option is applied only when pavuk is able to detect the document before starting the transfer.Default value is 0, and it means this limit isn't applied..TP.I -newer_than $timeAllow only transfer of documents with modification time newer than specified in parameter $time. Format of $time is: YYYY.MM.DD.hh:mm.To apply this option pavuk must be able to detect modification time of document..TP.I -older_than $timeAllow only transfer of documents with modification time older than specified in parameter $time. Format of $time is: YYYY.MM.DD.hh:mm.To apply this option pavuk must be able to detect modification time of document..TP.I -noCGI/-CGIthis switch prevents to transfer dynamically generated parametric documentsthrough CGI interface. This is detected with occurrence of \fB?\fR character inside URL.Default pavuk behavior is to allow transfer of URLs with query strings..TP.I -alang $listthis allows you to specify ordered comma separated list of preferred naturallanguages. This option work only with HTTP and HTTPS protocol using\fBAccept-Language:\fR MIME field..TP.I -acharset $listThis options allows you to enter comma separated list of preferred encoding oftransfered documents. This works only with HTTP and HTTPS urls and only if suchdocument encodings are located on destination server..br.B example: -acharset iso-8859-2,windows-1250,utf8.SH Limitation Document name.sp.TP.I -asfx $listThis parameter allows you to specify set of suffixes used to restrict selectionof documents which will be processed..TP.I -dsfx $listSet of suffixes that are used to specify restriction on selection of documents.This one is inverse to previous option. They are segregating each other..TP.I -aprefix $list, -dprefix $listThis two options allow you to specify set of allowed or disallowed prefixesof documents. They are segregating each other..TP.I -pattern $patternThis option allows you to specify wildcard pattern for documents. All documentsare tested if they match this pattern..TP.I -rpattern $reg_expThis is equal option as previous, but this uses regular expressions.Available only on platforms which have any supported RE implementation..TP.I -skip_pattern $patternThis option allows you to specify wildcard pattern for documents that should be skipped.All documents are tested if they match this pattern..TP.I -skip_rpattern $reg_expThis is equal option as previous, but this uses regular expressions.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -