⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 changelog

📁 网络爬虫程序
💻
📖 第 1 页 / 共 5 页
字号:
  option -fnrules) (thank for James Feeney base idea)* implemented optional saving of info files for each document (each info file  contain  source URL of document and documents downloaded via HTTP/HTTPS have  there whole HTTP header)* repared parsing of standalone CSS files* if is enabled storing of info files and you change default local tree layout  (with -fnrules or -base_level or -tr_* options) now will URLs newer overlap* new option -all_to_local used to force rewriting all URLs in HTML document,   to point to expected location* new reminder mode for checking if any URL was modified in given period* code cleanups* new option -sel_to_local used to force rewriting all URLs in HTML document,   which acomplish to limits, to point to expected location* many corrections in messages (thank to Colin Marquardt)* repared bug in removing BASE tag from HTML code, and now is not removed, but  commented out (thank for bug report and idea to Jan Tomasek)* added icons to OK && Cancel buttons in Gtk interface (GTK+ only)* changed all GtkList widgets to GtkCList* added Clear & Modify buttons to each editlist dialog (GTK+ only)* you can now optionaly change pixmaps for buttons from pavukrc file  (see all Btn*Icon*: statements)* fixed bug in ftp directory translation to HTML when using passwords with   FTP URL* finaly I fixed that bug which randomly puts trash to pattern options in GUI  interface. strtok() is realy bad function :-(* fstatfs emulation on SYSV systems using fstatvfs* better detection of heder files where is fstatfs declared* repared Seg Fault when using cookies (thank to Andrew Hall)* added more icons to GTK+ dialogs (thank to Frederic Toussaint)* each dialog window can be closed with Esc key (GTK+1.2 only)* each menu entry can have now assigned shortcut (GTK+1.2 only)* make uninstall now work well (thank to Colin Marquardt)* option -lmax now work properly with inline objects  (thank to Bernd Lutkenhoner)* removed old_buttons* actualized German message catalog (thank to Colin Marquardt), please if you  speak german check it and possible errors report to Colin* new option -check_cookie for enabling checking if cookie is set for from   which commes* fixed bug in cookie handling code* collections of button icons for pavuk in button_icons/* a bit fixed URL redirection code for nonabsolute URLs* fixed detection of base URL of document for documents with URL with search   string* new French message catalog (many thanks to Frederic Toussaint), please if you  speak french check it and possible corrections report to author* actualized Czech message catalog (thank to Petr Cech)version 0.9pl20 (Sep 29 1999)---------------* new option -all_to_remote used to leave all links inside HTML document to  remote location (proposed by Diego Antona Archilla)* fixed incompatibility with GTK+-1.0* with starting HTTP URLs now pavuk sends optionaly as Referer: field self URL  see option -auto_referer (proposed by Sergey Taranenko)* fixed segfault in cookie modification code* numbering of documents with overlaying local names for differen URLs* new better HTML tag handling rutines* removed a lot of memory leaks* URL downloading order strategies implemented (idea by Sergey Taranenko)* replaced GtkText widget with GtkCList widget in log window* now works limiting of length of log in GTK+ interface* fetching files from Netscape browser cache directory   (great idea by Sergey Taranenko)* new Spanish message catalog by Javier Comeronversion 0.9pl21 (Oct 13 1999)---------------* support for removing advertisement banners from HTML pages   (base idea by Mika Joukainen)* timestamps are writen to regular log file when starting and ending log  (proposed by Jan Tomasek)* support for Bell V8 inmplementation of regular expresions (as used in cygwin)* fixed SegFault which occurs while loading scenarios during downloading   progress (thank to Sergey Taranenko)* authorization info editor (only for GTK+ GUI)* new option -check_bg/-nocheck_bg used to detect if we run as background job,  if so don't write any messages to screen* fixed some errors in Xt interface errors* fixed bug when stdout isn't flushed before _exit()   (thank to Szabolcs Szakacsits)* new option -send_if_range/-nosend_if_range. This option should be used when  HTTP server supports reget, but sometimes generates different Etag field  for not changed document (if Etag and If-Range field differs reget will start  from begining of file)* locking of log file* optional numbering of log file when log file locked (option -unique_log)  (proposed by Sergey Taranenko)* several messages fixes (thank to Colin Marquardt)* running of post processing command after successful download of document  see option -post_cmd (proposed by Sergey Taranenko)* counting of fatal errors* fixed core dump in lfname structure cleanup when using fnmatch patterns  (thank to Kevin Gamiels report)* fixed bug which causes some broken links* fixed bug which causes bug when compiling Xt version of interface with   support for loading files from Netscape browser cache  (thank to Niraj Sachdeva)* portability to HPUX solved (thank to Niraj Sachdeva)* fixed bugs and oddities in sync mode code (thank to Szabolcs Szakacsits)* fixed typo which causes problems using mode linkupdate from command line  (thank to Szabolcs Szakacsits)* fixed bug when using -store_info, pavuk leaves opened some of lock   files, this causes Too many open files error (thank to Dawit Yimam)* significant speedup of sync mode* some internationalization fixes (thank to Javier Comeron)* several bug fixes in local name assigning code (when using -fnrules option)* fixed posible problems with timeout detection in GTK+ interface* now is posible to specify template of scheduling command   (look for -sched_cmd option)* fixed bad behavior with "" urls inside HTML documents* fixed bug in URL parsing when contains both anchor and searchstrversion 0.9pl22 (Nov ?? 1999)------------* fixed portability to systems which doesn't declare h_errno* got rid of all dirty strtok()s (I hope without mistakes)* removed all configuration environment values !!!!!!!!* fixed problems with loading files from NS cache on big endian machines* more properties for URL displayed in URL tree preview (GTK only)* added UI configuration for -stime option* fixed some bugs in base URL of document handling in HTML parser (thank to  Laurent Salles report)* fixed functionality of -min_size option (thank to Frank Baumgart)* fixed segfault when running user condition script (thank to Frank Baumgart)* added support for BSD regular expressions* added support for GNU regular expressions* started debug levels imlementation* selection of SSL client methods version implemented, option -ssl_version  (thank to Ians idea)* handling of &amp; and &#38; inside URLs (thank to Matts note)* fixed typo in configure script which casues misconfiguration in some cases* fixed handling of URLs with \n \r \t characters* repared handling of nonblocking IOs (thak to Szabolcs Szakacsits solution)* fixed bugy behaviour of get_abs_file_path() function* optional unique SSL ID with all SSL sesions (thank to Jeff Roberson howto)* added handling of starting urls in form server:[port]/...* added new Append URL dialog for appending URLs within downloading progress  (GTK only)* added proxy authorization with CONNECT request* fixed handling of \ and " characters inside quoted strings* added new option -httpad to be able to add some user defined HTTP headers  in HTTP requests* implemented statistical reports for downloading progress (can be saved to   file - -statfile option, or previewed inside GTK UI window)* fixed limits checking (prefix,postfix,patterns) for HTTP URLs with search   string part* changed debug mode controling with -debug_level option* new WIN32 specific option -ewait, to enable user to control if console  will disapear after pavuk will finished (proposed by Jan Tomasek)* started writing NEWS document, to enable users briefly know new pavuk   features in particular pavuk versions without reading huge ChangeLog file* new chance to save URL tree structure from URL tree preview dialog   window (GTK+-1.2 only)* .pavuk_info directories are now omited, when scanning local document tree  in linkupdate,resumeregets and local tree based sync mode* fixed pavuks behavior of option -check_bg on systems where getpgrp() needs  PID parameterversion 0.9pl23 (Dec 20 1999)---------------* huge internal rewrite, changed handling of some globals - big step to   MT version, cleanup of internal algorithms* implemented new mode (ftpdir) for listing contents of FTP directories  (proposed by Niraj Sachdeva)* added new macro %m (domain name) to -fnrules option* changed handling of encoded documents - now are decoded only HTML and   plain text documents all othere will be stored encoded* fixed corruption of cookies.txt file after user break* completely changed handling of refresh META tag - broken in several   previous releases* fixed potability to FreeBSD (thank to Holdrich Kristian)* new options -aip_pattern & -dip_pattern for specifying allowed IP   addreses with regular patterns (proposed by Samuel Laker)* fixed bug in option -debug_level setting to "all" (thank to Andreas Mohr)* fixed loging to nonanonymous FTP servers through HTTP gateway proxy  (thank to Andreas Mohr)* new option -site_level for limiting how many site levels to leave from   starting site* TOS settings for FTP data and control connection* introduced new protocol FTPS for making SSL connection to FTP servers   with SSL support* if you will set environment variable PAVUKRC_FILE, pavuk will read this   file as user pavukrc file instead of ~/.pavukrc file (proposed by   Andreas Mohr)* fixed SSL reading function, which should cause in some cases lost of data   at end of file or hang in select()* fixed problems with makealldirs() on WIN32 platform* added additional informations (size,processing time) to structured log  file (proposed by Dave Becket)* fixed problems with restarting in GUI interfaces* fixed preblem with URLs with slashes at end of query string (thank to   Dave Becket report)* fixed problem with naming of local copies of FTP directories when   downloading trough HTTP gateway* added new HTML tag for URL processing CSOBJ/HT* added new URL schemes for processing (tel,fax,modem,sms - from IETF drafts)* automatic handling of unsafe characters inside filenames (now handled only  Windows - \:*?"<>|) (proposed by Jan Tomasek)* configure script now detects if msgfmt supports --statistics option  (proposed by Dave Becket)* fixed hangup after blocking locking inside document read loop* implemented much cleaner blocking locking* fixed several odd behaviours when generating localname of document* implemented simple adjusting of too long filenames* partialy implemented HTTP/1.1 protocol with persistent connections !!!* new options -use_http11/-nouse_http11 for enablibg or disabling HTTP/1.1  protocol support* many many bug fixes* extended URL based sync mode. Now you can specify subdirectory which   contains mirrored documents (with option -subdir) and that directory is   scanned befor for documents, and after URL based synchronization is finished  pavuk starts checking URLs from local tree, which were not checked in URL  based synchronization.* get rid of most of unsafe static buffers* support for deflate encoding method via zlib* handling of 1xx HTTP response codes* bit changed behaviour with -site_level & -leave_level when processing  moved URLs* more automatic scan for OpenSSL || SSLeay libraries location* fixed bug , which causes segfault, if BASE URL is unknown or unsupported  (thank to Jeff Robersons report)* applyed patch from Jeff Roberson, which enables to use specified local  netwok interface for communication (usefull for multihomed hosts)  uses new option -local_ip* thanks to Colin Marquardt improved quality of manual* fixed linkupdate to work properly again (thank to Jaydeep Desais report)version 0.9pl24 (Feb 09 2000)---------------* implemented parsing of VMS style FTP directory listings* solved problems with FTP control connections, when pavuk breaks data   transfer before finished* rewriten from scratch URL parser - now is cleaner, easyer extensible,   faster and with lower memory footprint, and I hope conformable with  RFC 2396* new routine for comparing URLs based on url structure instead of URL   string - means faster and with lower memory footprint* bit better internal handling of query strings* fixed segfault with decoding nonHTML documents* fixed handling of FTP list processing on FTP servers which doesn't include  "total xxx" line on top of directory listing* added support for parsing old style BSD directories listings* removed some random memory leaks introduced in previous release* fixed closeups of several unhandled HTTP/1.1 persistant connections with  remaining unrequired data* fixed again handling of moved URLs with -leave_level option* fixed ftpdir mode behaviour with some of HTTP gateways for FTP (for example  Squid) (thanks to Niraj Sachdeva)* implemented HTTP POST requests (see option -request)* implemented parsing of DOS/Windows style FTP directory listings* fixed handling of oddly detected persistant connections when using HTTP/1.0  and talking to HTTP/1.1 server which doesn't respond with Connection: close  header* fixed "Zero size" posible error reporting only for cases when we don't know  exact size or size is non zero* implemented dialog for editing HTML forms (GTK+ only)* new option -hash_size for performance tuning when mirroring large amount

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -