⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 changelog

📁 网络爬虫程序
💻
📖 第 1 页 / 共 5 页
字号:
  of URLs* now supports FTP URLs as defined in RFC (ftp://serv.dom/path for relative  path to login directory and ftp://serv.dom//path for absolute path from FTP  server root directory)* changed behavior when doing FTP directory listings (CWD path + NLST/LIST   changed to NLST/LIST /path)* rejection of UNIX special files (sockets, devices, fifos) in FTP directory  listings* fixed segfault on empty FTP directory listings* fixed segfault in document info storing code* rewriten document locking routine, because of posible race conditions and  errors in previous implementation* enhancement for -fnrules option, which allows much higher flexibility in  local name asignment to document (undocumented and not well tested yet)* fixed unfunctional -store_name option* fixed h_errno test in configure script, to work on SYSV systems (thak to  Marc Chantome)* implemented droping of URLs to URL Append dialog* implemented option to be able to follow downloading process inside   URL tree preview window (GTK+-1.2 only) (proposed by Francois RicharC)* fixed odd behavior of FTP URL parser on WIN32 platform with FTP URLs in  form ftp://ftp.server.dom//absolute/path/...* fixed bug in new FTP directory procesing routines when listing directories  on MS FTP servers (thank to LE FAUCHEUR Frederic)* fixed bug in routine which is computing difference between GMT and local   time (on some platforms localtime() and gmtime() returns same staticaly  allocated buffer for returning result)* updated Properties view in URL Tree preview to show POST request infos* support for inserting POST request inside URL tree from Form editor  dialog* repared URL parser to support URLs in form http://www.server.dom?xxxx  http://www.server.dom#xxx* fixed posible segfault in FTP code, which may occure, when pavuk is not   able to establish data connection* fixed bugs in scenario saving code (thank to Peter Erbak, Bill Miller)* fixed cookies handling with moved documentsversion 0.9pl25 (Mar ?? 1999)---------------* get rid of all Xt GUI code* fixed bug in code which handles filesystem unsafe characters in Win32* fixed bug in sync mode which stops crawling when starting document is  up to date (thank to Dave Becket)* fixed minor bug in hadling of ; character inside URL* implemnted support for multiple HTTP proxy servers with inteligent round  robin scheduling* fixed segfault when using ftp/gopher HTTP gateway and cookies are enabled  for sending* fixed bug in url_compare() function which have bad results when comparing  URLs with different scheme (thank to Niraj Sachdeva)* fixed uninitialized HOME environment variable checking (thank to Andreas  Mohr)* added check for db_185.h to configure script when looking for Berkeley DB1  header files (thank to Roar Bergheim)* fixed checking of start/end time limits in sync mode (thank to Peter Thalman)* fixed segfault with moved robots.txt files (thank to Bill Miller)* fixed bug in function filename_to_url() which causes odd behavior mostly  in sync mode (thank to Peter Thalman)* fixed HTTP proxy Digest authorization code* added posibility to use authinfo file to store proxy authorization  informations* implemented optional multithreading support (now works only console version,  GTK version need some further changes and testing)* changed URL encoding/decoding handling, now user must enter regulary  encoded URLs* several simplification changes in Makefile.am files (thank to aldomel)* fixes to configure.in script Makefile.in files to get working   'make distcheck' (thanks to aldomel)* simplified recomputation of GMT time from local time on systems with  tm_gmtoff inside struct tm (thank to Robert Brennecke)* corrected pavuk behaviour when -request contains some unpredicable request  specifications (thank to aldomel)* fixed compilation with --disable-tree* fixed SSL read/write errors handling (thank to Jeff Roberson)* splited gui code to more modules* fixed segfault when trying to preview document properties in URL tree   preview dialog* fixed scheduling from UI* bit changed statusbar in UI* zilion miscelanous changes to get working GUI with multithreading* workaround HP-UX NAME_MAX/PATH_MAX settings to disable automatic adjusting  of long filenames to 14/255 limits (thank to Niraj Sachdeva)* get working again -store_name option (thank to Orestes Sanchez Benavente  and Jan Tomasek)* fixed posible problems with reading and writing via SSL on nonblocking  sockets.* fixed functionality of -local_ip option when you change it in GUI* fixed rewriting of URLs in HTML form action tags* optimalized header files dependencies - faster compilation* removed minor memory leaks in HTML forms processing code* corrected parsing of FTP response to PASV command to be able to cooperate  with publicfile FTP server (thank to Felix von Leitner)* fixed implementation of html_tag_co_elem() function* implemented chance to fill noninteractively HTML forms when matching form  is found (many thanks to Jeff Robersons idea and first implementation)* implemented dumping of documents to any supplied file descriptor (thank to  Honza Tomasek)* corrected pavuk process exit value computation (redirected documents are  not counted as failed yet) (thank to Thomas Coppock)* fixed bug in function url_to_absolute_url() which causes bad behaviour with   URLs ending with -index_name. (thank to Antoine Martin)* --------- released testing version 0.9pl25c* implemented code for saving session data to ~/.pavuk_keys in GTK interface* corrected handling of multiline lists in HTML form filling dialog* corrected several bugs in HTML forms parsing code* fixed hangup on exit when using language switching from GUI menu* fixed posible segfault when HTTP server respond with inproper response* --------- released testing version 0.9pl25d* added several sample identity strings to combobox in GUI* added files for integration to Gnome menu* fixed bug with -fnrules F ... caused by FNM_PATHNAME flag passed to   fnmatch() with some libc implementations (thank to Nicolay Mausz)* corrected bad behaviour of function get_abs_file_path_oss() which expands  wrong way relative paths to absolute paths* changed behaviour of 'Load scenario' which now resets configuration before  loading scenario and added new function 'Add scenario' which behaves same  as 'Load scenario' before* fixed bug introduced in 0.9pl25a which damages url structure and cause   cycling of download and hangups or segfaults on exit* adjusted NS cache directory access routines to be safe when accessing from  multiple threads* ---------- released testing version 0.9p25e* fixed segfault caused by wrong call to tl_str_concat() in doc_download()* fixed GUI compilation without NLS support (thanks to Gabor Z. Papp)* fixed Toggle toolbar functionality* minor corrections in Makefiles (thanks to Petr Cech)* fixed pavuk.spec file to properly build RPMs* updated Slovak,Cech,Spanish massage catalogs (thanks to all authors)version 0.9pl26 (Aug 31 2000)---------------* added new Italian message catalog by Antonio Fragola* updated German message catalog (thanks to Colin Marquardt)* fixed sending of HTTP Content-type: request header with POST requests* implemented optional deleting of remote FTP documents after successfull  transfer (idea by Gabor Z. Papp)* you can now optionaly disable the numbering of overlaying documents to   achive unique name using option -nounigue_name (idea by Nicolay Mausz)* added patch from Nicolay Mausz which implemnts new rmpar function in  -fnrules option syntax* fixed bug in SSL reading code which raises error when session was regulary  closed on other side (thanks to Martijn van Oosterhout patch)* fixed cooperation with SSL FTP servers which indicates succesful swith to  SSL mode with 234 response code (thanks to Martijn van Oosterhout patch)* fixed opening of FTP data connections. Old code should make deadlocks in  communication with some proxy servers. (thanks to Martijn van Oosterhout)* fixed typo in config.h which refuses compilation on HP-UX (thanks to Niraj  Sachdeva)* ---------- released testing version 0.9p26a* better checking for pthreads support in configure script* added option --with-gtk-config to configure script, to allow easier  configuration on system with such weird renaming of libs/scripts as  on FreeBSD* added handling of HTTP server response fields Content-Location:,  Content-Base:, Base: for setting base URL of document (thanks to Robo  Dobozy)* warning Zero lenght ... will now not apear with HTTP documents which  doesn't contain Content-Lenght: response field* fixed total document size computation of partialy transfered documents  if server doesn't provide Content-Lenght: header but only Content-Range:* fixed broken robots.txt parser* support for extended robots.txt standart with new Allow: statement* -request option was extended to allow specify in request also destination  filename of document in local filesystem* -debug_level user show now also filename where document is stored* fixed bug in robots.c when host name field in robots structure was   deallocated without discarding data when restarting* added MT locking of robots data; without locking should cause unpredicable  segfaults* now it is possible to enter empty values for form data in POST request   specification dialog* form editor dialog now properly extracts also hidden fields* corrected handling of HTTP response code 303 with POST requests, now pavuk  correctly redirects to GET request as it should* ---------- released testing version 0.9p26b* added support for PCRE regular expression in -*rpattern options and in   -fnrules option* -amime -dmime options now accepts also wildcard patterns* added TLSv1 support for HTTPS/FTPS communication* added new option in configure script --with-regex, which allow to select  preffered regular expression type (one of none/auto/posix/gnu/v8/bsd/pcre)* fixed compilation error in lfname.c when none of supported regular   expressions types was configured* enabled substring substitution in -lfname option when using Bell V8 regular  expressions and regsub() function is available (cygwin b20 doesn't export it)* added new option -dump_urlsfd to enable outputing URLs from downloaded HTML  documents to selected file descriptor - usable for scripting* addjusted filenames handling in WIN32 version to support new style of mapping  win32 paths to POSIX paths in newer cygwin-1.x.y versions* corrected comparing of URLs in -formdata option (thanks to Jeff Roberson)* ---------- released testing version 0.9pl26c* fixed seg-fault on parsing supported URLs with missing scheme dependant  part of URL string (thanks to Marc Tooley).* fixed problem with sleep() implementations which use SIGALRM for wake up  in multithreaded version (thanks to Antoine Martin)* new option -dont_leave_site_enter_dir/-leave_site_enter_dir which allows to  limit leaving of directory which we entered first on the site* enabled option -store_name to work also in other modes than just singlepage* wrote small document wget-pavuk.HOWTO for wget users who are starting to   use pavuk* updated manual page* -h option works now properly when -bg option is also used (thanks to  Artem Frolov)* attempt for workaround signal handling inconsistency in multithreading  environment (thanks to Antoine Martin)* define DB_LIBRARY_COMPATIBILITY_API in nscache.c before including db_185.h  to force reading 1.8x Berkeley DB format with 3.xx library* updated Slovak message catalog* ---------- released testing version 0.9pl26d* fixed problems with frozed threads on Solaris when starting download (thanks  to Antoine Martin)* added call to FreeConsole when running pavuk with -bg option on Win32   systems (thanks to Andreas Mohr)* added some gdk_flush() calls to status list modification code to force   better updates* added new option -singlepage/-nosinglepage to overcome limits of -mode  singlepage (thanks to Jo雔 Savignon)* now in sync mode is also checked size of documents downloaded over HTTP  (thanks to Raun Nohavitza)* added check for ssize_t type, without it wan't compile on Ultrix* ---------- released testing version 0.9pl26e* added support to using network paths on WIN32 with cygwin-1.1 =<* fixed broken -dont_leave_site_dir option* added commandline passwords hiding feature (thanks to Steven Haryanto)* fixed behaviour of -dont_leave_site_dir with moved site enter URLs* updated German and Spanish translations (thanks to Javier and Colin)version 0.9pl27 (Dec 13 2000)---------------* fixed infinite loop bug when both -store_name && -request options are used  (thak to Matthew)* add new menu to GUI for selecting starting URLs from opened documents inside  Netscape* fixed bug which causes to reload mostly all HTML documents in sync mode   because of sizes comapring* fixed bug in parsing FnameRules: scenario field (thanks to Le Faucheur  Frederic)* fixed freeze on scenario loading from GUI in multithreaded version (thanks  to Le Faucheur Frederic)* query string from HTTP/HTTPS URLs are now not decoded when generating   local names* new naming convention for local documents downloaded via POST request  name#query (thanks to mda)* fixed bug which causes hangs or segfaults when using -formdata option,  because of doublefreeing memory chunk (thanks to Matthew)* added two new patterns (<script , <style) to routine for guessing HTML files* fixed dumping of wrong ENCODING: fields in -formdata, -request infos to   oscenario file (thanks to Matthew)* ---------- released testing version 0.9pl27a* now works -disable_html_tag all or -enable_html_tag all to disable/enable   all HTML tags* fixed fast spawning loop in multithreaded version caused by bad use of  pthread_cond_timedwait() (thanks to Bjorn R. Bjornsson)* fixed progress display bug showing size in bytes instead of kilobytes  (thanks to Andreas Mohr)* fixed bug in FTP code when pavuk opens twice data connection for directory  listings (thanks to Raun Nohavitza)* fixed stupid bug when pavuk uses short int type instead of unsigned short

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -