⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pavuk.1.in

📁 网络爬虫程序
💻 IN
📖 第 1 页 / 共 5 页
字号:
.I Javascript support.br.I Cookie.br.I HTML rewriting engine tuning options.br.I Filename/URL Conversion Option.br.I Other Options.br.SH Mode.sp.TP.I -mode {normal, linkupdate, sync, singlepage, singlereget, resumeregets}Set operation mode..br.B normal- retrieves recursive documents.br.B linkupdate- update remote URLs in local HTML documents to local URLs if these URLsexist in the local tree.br.B sync- synchronize remote documents with local tree (if a local copy ofa document is older than remote, the document is retrieved again,otherwise nothing happens).br.B singlepage- URL is retrieved as one page with all inline objects (picture, sound ...)this mode is now obsoleted by \fB-singlepage\fR option..br.B resumeregets- pavuk scans the local tree for files that were not retrieved fullyand retrieves them again (uses partial get if possible).br.B singlereget- get URL until it is retrieved in full.br.B dontstore- transfer page from server, but don't store it to the local tree.This mode is suitable for fetching pages that are held in a localproxy/cache server..br.B reminder- used to inform the user about changed documents.br.B ftpdir- used to list of contents of FTP directories.br.spdefault operation mode is.B normalmode..SH Help.sp.TP.I -hPrint long verbose help message.TP.I -vShow version informations and configuration at compilation time..SH Indicate/Logging/Interface options.sp.TP.I -quietDon't show any messages on the screen..TP.I -verboseForce to show output messages on the screen (default).TP.I -progress/-noprogressShow retrieving progress while running in the terminal (default is progress off).TP.I -stime/-nostimeShow start and end time of transfer. (default isn't this information shown).TP.I -xmaxlog $nrMaximum number of log lines in the Log widget. 0 means unlimited.This option is available only when compiled with the GTK+ GUI. (default value is 0).TP.I -logfile $fileFile where all produced messages are stored..TP.I -unique_log/-nounique_logWhen logfile as specified with the option.B -logfileis already used by another process, try to generate new unique namefor the log file. (default is this option turned off).TP.I -slogfile $fileFile to store short logs in. This file contains one line ofinformations per processed document.  This is meant to be used inconnection with any sort of script to produce some statistics, forvalidating links on your website, or for generating simple sitemaps.Multiple pavuk processes can use this file concurrently, withoutoverwriting each others entries.Record structure:.br.sp.RS.nf- \fBPID\fR of pavuk process- \fBTIME\fR current time- \fBCOUNTER\fR in the format current/total number of URLs- \fBSTATUS\fR contains the type of the error: FATAL, ERR,  WARN or OK- \fBERRCODE\fR is the number code of the error  (see errcode.h in pavuk sources)- \fBURL\fR of the document- \fBPARENTURL\fR first parent document of this URL  (when it doesn't have parent - [none])- \fBFILENAME\fR is the name of the local file the  document is saved under- \fBSIZE\fR size of requested document if known- \fBDOWNLOAD_TIME\fR time which takes downloading of this  document in format seconds.mili_seconds- \fBHTTPRESP\fR contains the first line of the HTTP server  response.fi.RE.TP.I -language $strNative language that pavuk should use for communication with itsuser (works only when there is a message catalog for that language)\fBGNU gettext\fR support (for message internationalization) must also becompiled in. Default language is taken from your NLS environment variables..TP.I -gui_font $fontFont used in the GUI interface. To list available X fonts use the.B xlsfontscommand.This option is available only when compiled with GTK+ GUI support..SH Netli options.SP.TP.I -[no]read_cssEnable or disable fetching objects mentioned in style sheets..SP.TP.I -[no]verifyEnable or disable verifying server CERTS in SSL mode..SP.TP.I -tlogfile $fileTurn on Netli logging with output to specified file..SP.TP.I -trelative {object | program}Make Netli timings relative to the start of the first object or the program..SP.TP.I -transparent_proxy FQDN[:port]When processing URL, send the original, but send it to the IP addressat FQDN.SP.TP.I -transparent_ssl_proxy FQDN[:port]When processing HTTPS URL, send the original, but send it to the IP addressat FQDN.SP.TP.I -sdemoOutput in sdemo compatible format. This is only used by sdemo. (Fornow it simply means output '-1' rather than '*'  when measurements areinvalid.).SP.TP.I -noencodeDo not escape characters that are "unsafe" in URLS..SH Special start.sp.TP.I -XStart program with X Window interface (if compiled with support for GTK+).Pavuk as default starts without GUI, and behaves as regular commandline tool..TP.I -runXWhen used together with the \fB-X\fR option, pavuk starts processing ofURLs immediately after the GUI window is launched. Without the \fB-X\fRgiven, this option doesn't have any effect.Only available when compiled with GTK+ support ..TP.I -bg/-nobgThis option allows pavuk to detach from its terminal and run inbackground mode.  Pavuk will not output any messages to the terminalthan. If you want to see messages, you have to use the.B -log_fileoption to specify a file where messages will be written.Default pavuk executes at foreground..TP.I -check_bg/-nocheck_bgNormally, programs sent into the background after being run inforeground continue to output messages to the terminal.  If thisoption is activated, pavuk checks if it is running as background joband will not write any messages to the terminal in this case. Afterit becomes a foreground job again, it will start writing messages toterminal in the normal way.  This option is available only when yoursystem supports retrieving of terminal info via \fBtc*()\fR functions..TP.I -prefs/-noprefsWhen you turn this option on, pavuk will preserve all settings when exiting, andwhen you run pavuk with GUI interface again, all settings will be restored.The settings will be stored in the.B ~./pavuk_prefsfile. Default pavuk want restore its option when started.This option is available only when compiled with GTK+..TP.I -schedule $timeExecute pavuk at the time specified as parameter. The Format of the$time parameter is YYYY.MM.DD.hh.mm.  You need a properly configuredscheduling with the \fBat\fR command on your system for using this option.If default configuration (at -f %f %t %d.%m.%Y) of scheduling command won't work on your system, try to adjust it with \fB-sched_cmd\fR option..TP.I -reschedule $nrExecute pavuk periodically with $nr hours period.  You need properlyconfigured scheduling with the \fBat\fR command on your system for using this option..TP.I -sched_cmd $strCommand to use for scheduling. Pavuk explicitly supports scheduling with\fBat\fR $str should contain regular characters and macros, escaped by \fB%\fR character.Supported macros are:.br.in +3.B %f - for script filename.br.B %t - for time (in format HH:MM).br - all macros as supported by the \fBstrftime()\fR function.in -3.TP.I -urls_file $fileIf you use this option, pavuk will read URLs from $file before itstarts processing.  In this file, each URL needs to be on a separateline. After the last URL, a single dot \fB.\fR followed by a LF(line-feed) character denotes the end.  Pavuk will start processingright after all URLs have been read.  If \fB$file\fR is given as the \fB-\fRcharacter, standard input will be read..TP.I -store_info/-nostore_infoThis option causes pavuk to store information about each documentinto a separate file in the \fB.pavuk_info\fRdirectory. This file is used to store the original URL from whichthe document was downloaded. For files that are downloaded viaHTTP or HTTPS protocols, the whole HTTP response header is storedthere. I recommend to use this option when you are using optionsthat change the default layout of the local document tree, becausethis info file helps pavuk to map the local filename to theURL. This option is also very useful when different URLs have thesame filename in the local tree. When this occurs, pavuk detectsthis using info files, and it will prefix the local name withnumbers. At default is disabled storing of this extra informations..TP.I -info_dir $dirYou can set with this option location of separate directory for storinginfo files created when \fB-store_info\fR option is used. This is usefulwhen you don't want to mix in destination directory the info files withregular document files. The structure of the info files is preserved, justare stored in different directory..TP.I -request $reqWith this option you can specify extended informations for starting URLs.With this option you can specify query data for \fBPOST\fR or \fBGET\fR .Current syntax of this option is :\fBURL:["]$url["] [METHOD:["]{GET|POST}["]] [ENCODING:["]{u|m}["]] [FIELD:["]variable=value["]] [FILE:["]variable=filename["] [LNAME:["]local_filename["]]\fR.sp.RS.nf- \fBURL:\fR specifies request URL- \fBMETHOD:\fR specifies request method for URL and is  one of \fIGET\fR or \fIPOST\fR.- \fBENCODING:\fR specifies encoding for request body data.    \fBm\fR is for \fImultipart/form-data\fR encoding    \fBu\fR is for \fIapplication/x-www-form-urlencoded\fR    encoding- \fBFIELD:\fR specifies field of request data in format    \fBvariable=value\fR. For encoding of special characters    in \fIvariable\fR and \fIvalue\fR you can use same encoding    as is used in \fBapplication/x-www-form-urlencoded\fR    encoding.- \fBFILE:\fR specifies special field of query, which is    used to specify file for \fBPOST\fR based file upload.- \fBLNAME:\fR specifies localname for this request.fi.REWhen you need to use inside the \fBFIELD:\fR and \fBFILE:\fR fields ofrequest specification special characters, you should use the\fBapplication/x-www-form-urlencoded\fR encoding of characters. It meansall nonASCII characters, quote character ("), space character ( ),ampersand character (&), percent character (%) and equal character (=)should be encoded in form \fB%xx\fR where \fBxx\fR is hexadecimalrepresentation of ASCII value of character. So for example \fI%\fR charactershould be encoded like \fI%25\fR..TP.I -formdata $reqThis option gives you chance to specify contents for HTML forms found duringtraversing document tree. Current syntax of this option is same as for \fB-request\fR option, but\fBENCODING:\fR and \fBMETHOD:\fR are meaningless in this option semantics. In \fBURL:\fR you have to specify HTML form action URL, which will be matchedagainst action URLs found in processed HTML documents. If pavuk finds action URLwhich matches that supplied in \fB-formdata\fR option, pavuk will construct\fBGET\fR or \fBPOST\fR request from data supplied in this option and from defaultform field values supplied in HTML document. Values supplied on commandline haveprecedence before that supplied in HTML file..TP.I -nthreads $nrBy means of this option you can specify how many concurrent threads willdownload documents. Default pavuk executes 3 concurrent downloading threads.This option is available only when pavuk is compiled to support multithreading..TP.I -immesg/-noimmesgDefault pavuks behavior when running multiple downloading threads is to bufferall output messages in memory buffer and flush that buffered data just when threadfinishes processing of one document. With this option you can change thisbehavior to see the messages immediately when it is produced. It is only usablewhen you want to debug some specials in multithreading environment.This option is available only when pavuk is compiled to support multithreading.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -