⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 install.txt

📁 开源的蜘蛛程序
💻 TXT
字号:
============================================
Sphider - a lightweight search engine in PHP
Version 1.2.x

Installation and usage instructions
Ando Saabas 2005
============================================

------------
Documentation
------------

   1. Installation
   2. Indexing options
   3. Customizing
   4. Indexing from command-line
   5. Indexing pdf and doc files
   6. Keeping pages from being indexed
      * Robots.txt
      * Must include / must not include string list
      * Ignoring links
      * Ignoring parts of a page


1. Installation

1. Unpack the files, and copy them to the server, for example to /home/youruser/public_html/sphider (later referred to as 
[path_of_sphider])

2. In the server, create a database in MySQL to hold Sphider data.

a) at command prompt type (to log into MySQL):
mysql -u <your username> -p
Enter your password when prompted.

b) in MySQL, type:
CREATE DATABASE sphider_db;

Of course you can use some other name for database instead of sphider_db.

c) Use exit to exit MySQL.
For more information on how to create a database and give/get the necessary permissions, check MySQL.com

3. In include directory, edit connect.php file and change $database, $mysql_user, $mysql_password and $mysql_host to correct values (if 
you dont know what $mysql_host should be, it should probably stay as it is - 'localhost').

4. Open install.php script (admin directory) in your browser, which will create the tables necessary for Sphider to operate.

Alternatively, the tables can be created by hand using tables.sql script given in the sql directory of the Sphider distribution. In the 
prompt, type
mysql -u <your username> -p sphider_db < [path_of_sphider]/sql/tables.sql

5. In admin directory, edit auth.php to change the administrator user name and password (default values are 'admin' and 'admin').

6. Open admin/admin.php in browser and start indexing.

7. index.php is the default search page.


2. Indexing options
Full: Indexing continues until there are no further (permitted) links to follow.
To depth: Indexes to a given depth, where depth means how many "clicks" away the page can be from the starting page. Depth 0 means that 
only the starting page is indexed, depth 1 indexes the starting page and all the pages linked from it etc.
Reindex: By checking this checkbox, indexing is forced even if the page already has been indexed.
Spider can leave domain : By default, Sphider never leaves a given domain, so that links from domain.com pointing to domain2.com are not 
followed. By checking this option Sphider can leave the domain, however in this case its highly advisable to define proper must include / 
must not include string lists to prevent the spider from going too far.
Must include / must not include: See here for an explanation.


3. Customizing

If you want to change the default behaviour of Sphider, edit conf.php in include directory. To customize how results are displayed, change 
index_head.inc and index_foot.inc files, and search.css file. List of file types that are not checked for indexing are given in 
admin/ext.txt. The list of common words that are not indexed are given in include/common.txt.


4. Using the indexer from commandline

It is possible to spider webpages from the command line, using the syntax:

php spider.php <url> full|<num> [reindex]

For example, for fully spidering and indexing http://www.domain.com/test.html, type
php spider.php http://www.domain.com/test.html full

If you want to reindex the same url, but only to depth 3, type
php spider.php http://www.domain.com/test.html 3 reindex 

There is also the "all" option, which indexes/reindexes all sites in the database, using the spidering option given for each particular 
site:
php spider.php all

5. Indexing pdf and doc files<
Pdf and doc files can be indexed via external binaries. Download and install
pdftotext (http://www.foolabs.com/xpdf/download.html) and catdoc (http://www.45.free.net/~vitus/ice/catdoc/) and set there location(path) in conf.php (note that under Windows, you should not use 
spaces in defining the executable's path). Additionally, set $index_pdf and $index_doc
parameters to 1 in conf.php.

6. Keeping pages from being indexed
* Robots.txt
The most common way to prevent pages from being indexed is using the robots.txt standard, by either putting a robots.txt file into the 
root directory of the server, or adding the necessary meta tags into the page headers (for more information on how to do this, see here).

* Must include / must not include string list
A powerful option Sphider supports is defining a must include / must not include string list for a site (click on Advanced options in 
Index screen for this). Any url not containing a string in the 'must include' list is ignored, as are urls containing strings in 'must not 
include' list. All strings in the string list should be separated by a newline (enter). For example, to prevent a forum in your site from 
being indexed, you might add www.yoursite.com/forum to the "must not include" list. This means that all urls containing the string will be 
ignored and wont be indexed. Using Perl style regular expressions instead of literal strings is also supported. Every string starting with 
a '*' in front is considered as a regular expression, so that '*/[a]+/' denotes a string with one or more a's in it.

* Ignoring links
Sphider respect rel="nofollow" attribute in <a href..> tags, so for example the link foo.html in <a href="foo.html" rel="nofollow> is 
ignored.

* Ignoring parts of a page
Sphider includes an option to exclude parts of pages from being indexed. This can for example be used to prevent search result flooding 
when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between 
<!--sphider_noindex--> and <!--/sphider_noindex--> tags is not indexed, however links in it are followed.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -