⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 manual.txt

📁 harvest是一个下载html网页得机器人
💻 TXT
📖 第 1 页 / 共 5 页
字号:
  33..11..11..  HHaarrddwwaarree  A good machine for running a typical Harvest server will have a  reasonably fast processor, 1-2 GB of free disk, and 128 MB of RAM.  A  slower CPU will work but it will slow down the Harvest server.  More  important than CPU speed, however, is memory size.  Harvest uses a  number of processes, some of which provide needed ``plumbing'' (e.g.,  search.cgi), and some of which improve performance (e.g., the  glimpseserver process).  If you do not have enough memory, your system  will page too much, and drastically reduce performance.  The other  factor affecting RAM usage is how much data you are trying to index in  a Harvest Broker.  The more data, the more disk I/O will be performed  at query time, the more RAM it will take to provide a reasonable sized  disk buffer pool.  The amount of disk you'll need depends on how much data you want to  index in a single Broker.  (It is possible to distribute your index  over multiple Brokers if it gets too large for one disk.)  A good rule  of thumb is that you will need about 10% as much disk to hold the  Gatherer and Broker databases as the total size of the data you want  to index.  The actual space needs will vary depending on the type of  data you are indexing.  For example, PostScript achieves a much higher  indexing space reduction than HTML, because so much of the PostScript  data (such as page positioning information) is discarded when building  the index.  33..11..22..  PPllaattffoorrmmss  To run a Harvest server, you need an UNIX-like Operating System.  33..11..33..  SSooffttwwaarree  To use Harvest, you need the following software packages:  +o  All Harvest servers require: Perl v5.0 or higher.  +o  The Harvest Broker and Gatherer require: GNU gzip v1.2.4 or higher.  +o  The Harvest Broker requires: HTTP server.  To build Harvest from the source distribution you may need to install  one or more of the following software packages:  +o  Compiling Harvest requires: GNU gcc v2.5.8 or higher.  +o  Compiling the Harvest Broker requires: flex v2.4.7 or higher and     bison v1.22 or higher.  The sources for gcc, gzip, flex, and bison are available at the GNU  FTP server <ftp://ftp.gnu.org/>.  33..22..  RReeqquuiirreemmeennttss ffoorr HHaarrvveesstt UUsseerrss  Anyone with a web browser (e.g., Internet Explorer, Lynx, Mozilla,  Netscape, Opera, etc.) can access and use Harvest servers.  33..33..  RReettrriieevviinngg aanndd IInnssttaalllliinngg tthhee HHaarrvveesstt SSooffttwwaarree  33..33..11..  DDiissttrriibbuuttiioonn ttyyppeess  Currently we offer only source distribution of Harvest.  The _s_o_u_r_c_e  _d_i_s_t_r_i_b_u_t_i_o_n contains all of the source code for the Harvest software.  There are no _b_i_n_a_r_y _d_i_s_t_r_i_b_u_t_i_o_n_s of Harvest.  You can retrieve the Harvest source distributions from the Harvest  download site prdownloads.sourceforge.net/harvest/.  33..33..22..  HHaarrvveesstt ccoommppoonneennttss  Harvest components are in the _c_o_m_p_o_n_e_n_t_s directory.  To use a  component, follow the instructions included in the desired component  directory.  33..33..33..  UUsseerr--ccoonnttrriibbuutteedd ssooffttwwaarree  There is a collection of unsupported user-contributed software in  _c_o_n_t_r_i_b directory.  If you would like to contribute some software,  please send email to lee@arco.de <mailto:lee@arco.de>.  33..44..  BBuuiillddiinngg tthhee SSoouurrccee DDiissttrriibbuuttiioonn  The source distribution can be extracted in any directory.  The  following command will extract the gnu-zipped source archive:               % gzip -dc harvest-x.y.z.tar.gz | tar xf -  For archives compressed with bzip2, use:               % bzip2 -dc harvest-x.y.z.tar.bz2 | tar xf -  Harvest uses GNU's _a_u_t_o_c_o_n_f package to perform needed configuration at  installation time.  If you want to override the default installation  location of _/_u_s_r_/_l_o_c_a_l_/_h_a_r_v_e_s_t, change the ``prefix'' variable when  invoking ``configure''.  If desired, you may edit  _s_r_c_/_c_o_m_m_o_n_/_i_n_c_l_u_d_e_/_c_o_n_f_i_g_._h before compiling to change various Harvest  compile-time limits and variables.  To compile the source tree type  make.  For example, to build and install the entire Harvest system into  _/_u_s_r_/_l_o_c_a_l_/_h_a_r_v_e_s_t directory, type:               % ./configure               % make               % make install  You may see some compiler warning messages, which you can ignore.  Building the entire Harvest distribution will take few minutes on a  reasonably fast machine.  The compiled source tree takes approximately  25 megabytes of disk space.  Later, after the installed software working, you can remove the  compiled code (``.o'' files) and other intermediate files by typing  make clean.  If you want to remove the configure-generated Makefiles,  type make distclean.  33..55..  AAddddiittiioonnaall iinnssttaallllaattiioonn ffoorr tthhee HHaarrvveesstt BBrrookkeerr  33..55..11..  CChheecckkiinngg tthhee iinnssttaallllaattiioonn ffoorr HHTTTTPP aacccceessss  The Broker interacts with your HTTP server in a number of ways.  You  should make sure that the HTTP server can properly access the files it  needs.  In many cases, the HTTP server will run under a different  userid than the owner of the Harvest files.  First, make sure the HTTP server userid can read the _q_u_e_r_y_._h_t_m_l files  in each broker directory.  Second, make sure the HTTP server userid  can access and execute the CGI programs in _$_H_A_R_V_E_S_T___H_O_M_E_/_c_g_i_-_b_i_n_/.  The search.cgi script reads files from the _$_H_A_R_V_E_S_T___H_O_M_E_/_c_g_i_-_b_i_n_/_l_i_b_/  directory, so check that as well.  Finally, check the files in  _$_H_A_R_V_E_S_T___H_O_M_E_/_l_i_b_/.  Some of the CGI Perl scripts require ``include''  files in this directory.  33..55..22..  RReeqquuiirreedd mmooddiiffiiccaattiioonnss ttoo yyoouurr HHTTTTPP sseerrvveerr  The Harvest Broker requires that an HTTP server is running, and that  the HTTP server ``knows'' about the Broker's files.  Below are some  examples of how to configure various HTTP servers to work with the  Harvest Broker.  33..55..33..  AAppaacchhee hhttttppdd  Requires a SSccrriippttAAlliiaass and an AAlliiaass entry in _h_t_t_p_d_._c_o_n_f, e.g.:               ScriptAlias /Harvest/cgi-bin/ Your-HARVEST_HOME/cgi-bin/               Alias /Harvest/ Your-HARVEST_HOME/  _W_A_R_N_I_N_G_: The SSccrriippttAAlliiaass entry must appear _b_e_f_o_r_e the AAlliiaass entry.  Additionally, it might be necessary to configure Apache httpd to  follow _s_y_m_b_o_l_i_c _l_i_n_k_s.  To do this, add following to your _h_t_t_p_d_._c_o_n_f:          <Directory Your-HARVEST_HOME>                  Options FollowSymLinks          </Directory>  33..55..44..  OOtthheerr HHTTTTPP sseerrvveerrss  Install the HTTP server and modify its configuration file so that the  _/_H_a_r_v_e_s_t directory points to _$_H_A_R_V_E_S_T___H_O_M_E.  You will also need to  configure your HTTP server so that it knows that the directory  _/_H_a_r_v_e_s_t_/_c_g_i_-_b_i_n contains valid CGI programs.  If the default  behaviour of your HTTP server is not to follow symbolik links, you  will need to configure it so that it will follow symbolic links in the  _/_H_a_r_v_e_s_t directory.  33..66..  UUppggrraaddiinngg vveerrssiioonnss ooff tthhee HHaarrvveesstt ssooffttwwaarree  33..66..11..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..66 ttoo vveerrssiioonn 11..88  You _c_a_n _n_o_t install version 1.8 on top of version 1.6.  For example,  the change from version 1.6 to version 1.8 included some  reorganization of the executables, and hence simply installing version  1.8 on top of version 1.6 would cause you to use old executables in  some cases.  To upgrade from Harvest version 1.6 to 1.8, do:  1. Move your old installation to a temporary location.  2. Install the new version as directed by the release notes.  3. Then, for each Gatherer and Broker that you were running under the     old installation, migrate the server into the new installation.     GGaatthheerreerrss::        you need to move the Gatherer's directory into        _$_H_A_R_V_E_S_T___H_O_M_E_/_g_a_t_h_e_r_e_r_s.  Section ``RootNode specifications''        describes the Gatherer workload specifications if you want to        modify your Gatherer's configuration file.     BBrrookkeerrss::        rebuild your broker by using CreateBroker and merge in any        customizations you have made to your old Broker.  33..66..22..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..55 ttoo vveerrssiioonn 11..66  There are no known incompatibilities between versions 1.5 and 1.6.  33..66..33..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..44 ttoo vveerrssiioonn 11..55  You _c_a_n _n_o_t install version 1.5 on top of version 1.4.  For example,  the change from version 1.4 to version 1.5 included some  reorganization of the executables, and hence simply installing version  1.5 on top of version 1.4 would cause you to use old executables in  some cases.  To upgrade from Harvest version 1.4 to 1.5, do:  1. Move your old installation to a temporary location.  2. Install the new version as directed by the release notes.  3. Then, for each Gatherer and Broker that you were running under the     old installation, migrate the server into the new installation.     GGaatthheerreerrss::        you need to move the Gatherer's directory into        _$_H_A_R_V_E_S_T___H_O_M_E_/_g_a_t_h_e_r_e_r_s.  Section ``RootNode specifications''        describes the Gatherer workload specifications if you want to        modify your Gatherer's configuration file.     BBrrookkeerrss::        you need to move the Broker's directory into        _$_H_A_R_V_E_S_T___H_O_M_E_/_b_r_o_k_e_r_s. Remove any _._g_l_i_m_p_s_e___* files from your        Broker's directory and use the _a_d_m_i_n_._h_t_m_l interface to force a        full-index.  You may want, however, to rebuild your broker by        using CreateBroker so that you can use the updated _q_u_e_r_y_._h_t_m_l        and related files.  33..66..44..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..33 ttoo vveerrssiioonn 11..44  There are no known incompatibilities between versions 1.3 and 1.4.  33..66..55..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..22 ttoo vveerrssiioonn 11..33  Version 1.3 is mostly backwards compatible with 1.2, with the  following exception:  Harvest 1.3 uses Glimpse 3.0.  The _._g_l_i_m_p_s_e___* files in the broker  directory created with Harvest 1.2 (Glimpse 2.0) are incompatible.  After installing Harvest 1.3 you should:  1. Shutdown any running brokers.  2. Execute rm .glimpse_* in each broker directory.  3. Restart your brokers with RunBroker.  4. Force a full-index from the _a_d_m_i_n_._h_t_m_l interface.  33..66..66..  UUppggrraaddiinngg ffrroomm vveerrssiioonn 11..11 ttoo vveerrssiioonn 11..22  There are a few incompatabilities between Harvest version 1.1 and  version 1.2.  +o  The Gatherer has improved incremental gatherering support which is     incompatible with version 1.1.  To update your existing Gatherer,     change into the Gatherer's _D_a_t_a_-_D_i_r_e_c_t_o_r_y (usually the _d_a_t_a     subdirectory), and run the following command:          % set path = ($HARVEST_HOME/lib/gatherer $path)          % cd data          % rm -f INDEX.gdbm          % mkindex  This should create the _I_N_D_E_X_._g_d_b_m and _M_D_5_._g_d_b_m files in the current  directory.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -