⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme

📁 网络爬虫程序
💻
字号:
This is public beta release of Pavuk.Pavuk in Slovak language means spider.What this program does :-recursive HTTP, HTTP over SSL, FTP, FTP over SSL and Gopher document retrieving-supports HTTP/1.1 with persistent connections-supports HTTP POST requests-can automaticaly fill forms from HTML documents and make POST or GET requestes based on user input and form content-synchronizing retrieved local copies of document with remote-partial content retrieving on servers which suppots it (FTP and HTTP/1.1)-automaticaly follows moved documents-supports robots exclusion standard via "robots.txt" and <META NAME=robots ..>-supports HTTP, FTP, Gopher, SSL proxy servers-supports HTTP authentification (user, Basic, Digest, NTLM)-shows document tree-have interface to "at" command for scheduling-have optional GTK+ user interface-may be built with or without X Window user interface-can handle setup files (scenarios)-supports NLS via GNU gettext (so messages are translatable to many native  languages)-can be used for fetching documents to proxy/cache server (-mode dontstore)-supports HTTP cookies-supports SOCKS(4/5) proxy-you can run as many as instances of pavuk in same tree without any loose of data, because pavuk locks documents while procesing-you can limit transfer rate over network (speed throttling)-have powerfull mechanism for mapping URLs to local filenames (-fnrules)-can check URL wheter it is modified and can send list of modified URLs to any  script-can load files from Netscape or MSIE browser cache directory-can filter advertisement banners-can automaticaly turn on/off output of messages to terminal when running in foreground or background-can run users postprocessing scripts for each downloaded document-can run user scripts for decision wheter links from current HTML document should be downloaded-optionaly can generate statistical reports from download, usable for WEB site link checking-supports different FTP directory listing formats (SYSV, BSD, EPFL, NOVEL, VMS, DOS/WINDOWS)-multithreading support-supports multiple HTTP proxies with round robin scheduling-have simple support for javascript using URL patterns-supports persistent HTTP/1.0 proxy connections-have simple JavaScript bindings to allow much more flexible conditions for excluding URLs from transfer....?License :Everything is licensed under GPL. See COPYING and COPYING.LIB.Ported to:-Linux (x86,ppc) (gcc)	Single threaded: supported, works well	Multi threaded: supported, works well with glibc2&LinuxThreads,			glibc1&LinuxThreads and glibc1&pcthreads	GUI: Gtk+-Digital Unix 3.2 (alpha) (cc and gcc), 4.0 (alpha) (cc and egcs)	Single threaded: supported, works well	Multi threaded: supported, works well	GUI: Gtk+-Ultrix 4.4 (mips) (cc,egcs) (need to use bash instead of default sh to                               run configure script)	Single threaded: supported, works well	Multi threaded: not supported, I don't know about any POSIX threads			implementation for Ultrix	GUI: Gtk+-NetBSD (sparc, mips) (gcc)	latest versions not tested-NetBSD-1.4.2 (x86) (gcc)	Single threaded: supported, works well	Multi threaded: not supported, I don't know about any POSIX threads			implementation for this version of NetBSD	GUI: Gtk+ (not tested)-OpenBSD-2.7 (x86) (gcc)	Single threaded: supported, works well	Multi threaded: supported, works well (not much tested)	GUI: Gtk+ (not tested)-FreeBSD-4.0 (x86) (gcc)	Single threaded: supported, works well	Multi threaded: works, sometimes I get kernel error			"microuptime() went backwards"	GUI: Gtk+ (need to pass option --with-gtk-config=gtk12-config to	     configure script)-WIN32 (x86) (gcc + cygwin POSIX emulation layer)	Single threaded: supported, works well	Multi threaded: doesn't work, because POSIX threads implementation is			not finished yet	GUI: win32 Gtk+ for cygwin-Solaris 7 (x86) (gcc)	Single threaded: supported, works well	Multi threaded: supported, works well with POSIX threads	GUI: Gtk+-QNX RtP (x86) (gcc)	Single threaded: supported, works well	Multi threaded: supported, works well with libc POSIX threads	GUI: Gtk+ in XPhoton-BeOS 5 PE (x86) (gcc)	Single threaded: supported, works except document locking	Multi threaded: not supported, missing POSIX threads	GUI: not supported-also several other UNIX-es, Mac OS X server, OS/2 reported to workAuthor :Ondrejicka StefanHome page:http://pavuk.sourceforge.net/If you find any bugs please send me report and or fix. If you have any suggestions, ideas or questions please contact developers.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -