📄 indexer.conf
字号:
#########################################################################NetErrorDelayTime <time># Specify document processing delay time if network error has occurred.# For <time> format see description of Period above.# Default value is one day#NetErrorDelayTime 1d########################################################################Robots yes/no# Allows/disallows using robots.txt and <META NAME="robots"># exclusions. Use "no", for example for link validation of your server(s).# Command may be used several times before "Server" command and# takes effect till the end of config file or till next Robots command.# Default value is "yes".#Robots yes########################################################################DetectClones yes/no# Allow/disallow clone detection and eliminating. If allowed, indexer will # detect the same documents under different location, such as# mirrors, and will index only one document from the group of# such equal documents. "DetectClones yes" also allows to reduce space usage.# Default value is "yes".DetectClones yes######################################################################## Document sections.## Format is:## Section <string> <number> <maxlen> [clone] [sep] [{expr} {repl}]## where <string> is a section name and <number> is section ID# between 0 and 255. Use 0 if you don't want to index some of # these sections. It is better to use different sections IDs# for different documents parts. In this case during search # time you'll be able to give different weight to each part# or even disallow some sections at a search time.# <maxlen> argument contains a maximum length of section# which will be stored in database.# "clone" is an optional parameter describing whether this# section should affect clone detection. It can # be "DetectClone" or "cdon", or "NoDetectClone" or "cdoff".# By default, url.* section values are not taken in account# for clone detection, while any other sections take part# in clone detection.# "sep" is an optional argument to specify a separator between# parts of the same section. It is a space character by default.# "expr" and "repl" can be used to extract user defined sections,# for example pieces of text between the given tags. "expr" is# a regular expression, "repl" is a replacement with $1, $2, etc# meta-characters designating matches "expr" matches.# Standard HTML sections: body, titleSection body 1 256Section title 2 128# META tags# For example <META NAME="KEYWORDS" CONTENT="xxxx">#Section meta.keywords 3 128Section meta.description 4 128# HTTP headers example, let's store "Server" HTTP header###Section header.server 5 64# Document's URL partsSection url.file 6 0Section url.path 7 0Section url.host 8 0Section url.proto 9 0# CrossWordsSection crosswords 10 0## If you use CachedCopy for smart excerpts (see below), # please keep Charset section active.#Section Charset 11 32Section Content-Type 12 64Section Content-Language 13 16# Uncomment the following lines if you want tag attributes# to be indexed#Section attribute.alt 14 128#Section attribute.label 15 128#Section attribute.summary 16 128#Section attribute.title 17 128#Section attribute.face 27 0# Uncomment the following lines if you want use NewsExtensions# You may add any Newsgroups header to be indexed and stored in urlinfo table#Section References 18 0#Section Message-ID 19 0#Section Parent-ID 20 0# Uncomment the following lines if you want index MP3 tags.#Section MP3.Song 21 128#Section MP3.Album 22 128#Section MP3.Artist 23 128#Section MP3.Year 24 128# Comment this line out if you don't want to store "cached copies"# to generate smart excerpts at search time.# Don't forget to keep "Charset" section active if you use cached copies.# NOTE: 3.2.18 has limits for CachedCopy size, 32000 for Ibase and# 15000 for Mimer. Other databases do not have limits.# If indexer fails with 'string too long' error message then reduce# this number. This will be fixed in the future versions.#Section CachedCopy 25 64000# A user defined section example.# Extract text between <h1> and </h1> tags:#Section h1 26 128 "<h1>(.*)</h1>" $1###########################################################################<IndexIf|NoIndexIf> [Match|NoMatch] [NoCase|Case] [String|Regex] <Section> <arg> [<arg> ... ]# Use this to allow documents, which sections match given argument,# to be indexed (IndexIf) or not indexed (NoIndexIf).# First three optional parameters describe the type of comparison.# Default values are Match, NoCase, String.# See also Allow/Disallow.# You may use several arguments for one 'IndexIf/NoIndexIf' commands.# You may use this command any times.# Takes global effect for config file.# Example1: Don't index a document if its Body section contains "porno"#NoIndexIf Body *porno*## Example2: Index only those documents with Title section containing "reference"#IndexIf Title *reference*#NoIndexIf Title *########################################################################Index yes/no# Prevent indexer from storing words into database.# Useful for example for link validation.# Can be set multiple times before "Server" command and# takes effect till the end of config file or till next Index command.# Default value is "yes".#Index yes#########################################################################RemoteCharset <charset>#<RemoteCharset> is default character set for the server in next "Server" # command(s).# This is required only for "bad" servers that do not send information# about charset in header: "Content-type: text/html; charset=some_charset"# and do not have <META NAME="Content" Content="text/html; charset="some_charset"># Can be set before every "Server" command and# takes effect till the end of config file or till next RemoteCharset command.# Default value is iso-8859-1 (latin1).RemoteCharset iso-8859-1##########################################################################ProxyAuthBasic login:passwd# Use HTTP proxy basic authorization # Can be used before every "Server" command and# takes effect only for next one "Server" command!# It should be also before "Proxy" command.# Examples:#ProxyAuthBasic somebody:something ##########################################################################Proxy your.proxy.host[:port]# Use proxy rather then connect directly#One can index ftp servers when using proxy#Default port value if not specified is 3128 (Squid)#If proxy host is not specified direct connect will be used.#Can be set before every "Server" command and# takes effect till the end of config file or till next Proxy command.#If no one "Proxy" command specified indexer will use direct connect.## Examples:# Proxy on atoll.anywhere.com, port 3128:#Proxy atoll.anywhere.com## Proxy on lota.anywhere.com, port 8090:#Proxy lota.anywhere.com:8090## Disable proxy (direct connect):#Proxy##########################################################################AuthBasic login:passwd# Use basic HTTP authorization # Can be set before every "Server" command and# takes effect only for next one Server command!# Examples:#AuthBasic somebody:something ## If you have password protected directory(ies), but whole server is open,use:#AuthBasic login1:passwd1#Server http://my.server.com/my/secure/directory1/#AuthBasic login2:passwd2#Server http://my.server.com/my/secure/directory2/#Server http://my.server.com/############################################################### Mirroring parameters commands.## You may specify a path to root directory to enable sites mirroring#MirrorRoot /path/to/mirror## You may specify as well root directory of mirrored document's headers# indexer will store HTTP headers to local disk too.#MirrorHeadersRoot /path/to/headers## MirrorPeriod <time># You may specify period during which earlier mirrored files # will be used while indexing instead of real downloading.# It is very useful when you do some experiments with mnoGoSearch# indexing the same hosts and do not want much traffic from/to Internet.# If MirrorHeadersRoot is not specified and headers are not stored# to local disk then default Content-Type's given in AddType commands# will be used.# Default value of the MirrorPeriod is -1, which means# "do not use mirrored files".## For <time> format see Period command description above.## The command below will force using local copies for one day:#MirrorPeriod 1d##########################################################################ServerWeight <number># Server weight for Popularity Rank calculation.# Default value is 1.#ServerWeight 1##########################################################################PopRankSkipSameSite yes|no# Skip links from same site for Popularity Rank calculation.# Default value is "no".#PopRankSkipSameSite yes##########################################################################PopRankFeedBack yes|no# Calculate sites wights before Popularity Rank calculation.# Default value is "no".#PopRankSkipSameSite yes##########################################################################Server [Method] [SubSection] <URL> [alias]# This is the main command of the indexer.conf file. It's used # to describe web-space you want to index. It also inserts# given URL into database to use it as a start point.# You may use "Server" command as many times as a number of different# servers or their parts you want to index.## "Method" is an optional parameter which can take on of the following values:# Allow, Disallow, CheckOnly, HrefOnly, CheckMP3, CheckMP3Only, Skip.## "SubSection" is an optional parameter to specify server's subsection, # i.e. a part of Server command argument. # It can take the following values:# "page" describes web space which consists of one page with address <URL>.# "path" describes all documents which are under the same path with <URL>.# "site" describes all documents from the same host with <URL>.# "world" means "any document".# Default value is "path".## To index whole server "localhost":#Server http://localhost/## You can also specify some path to index subdirectory only:#Server http://localhost/subdir/## To specify the only one page:#Server page http://localhost/path/main.html## To index whole server but giving non-root page as a start point:#Server site http://localhost/path/main.html### You can also specify optional parameter "alias". This example will# index server "http://www.mnogosearch.org/" directly from disk instead of# fetching from HTTP server:#Server http://www.mnogosearch.org/ file:///home/httpd/www.mnogosearch.org/#Server http://your.domain.com/indexer_login.php##########################################################################Realm [Method] [CmpType] [Match|NoMatch] <arg> [alias]# It works almost like "Server" command but takes a regular expression or # a string wildcards as it's argument and do not insert any URL into# database for indexing. To insert URLs into database use URL command (see# below).## "Method" is an optional parameter which can take one of the following# values: Allow, Disallow, CheckOnly, HrefOnly, CheckMP3, CheckMP3Only# with Allow as a default value.## "CmpType" is an optional parameter to specify comparison type and can# take either String or RegExp value with String as a default value.## For example, if you want to index all HTTP sites in ".ru" domain, use:#Realm http://*.ru/*# The same using "Regex" match:#Realm Regex ^http://.*\.ru/# Another example. Use this command to index everything without .com domain:#Realm NoMatch http://*.com/*## Optional "alias" argument allows to provide very complicated URL rewrite# more powerful than other aliasing mechanism. Take a look into alias.txt# for "alias" argument usage explanation.##########################################################################URL http://localhost/path/to/page.html# This command inserts given URL into database. This is useful to add# several entry points to one server. Has no effect if an URL is already# in the database. When inserting indexer does not executes any checks# and this URL may be deleted at first indexing attempt if URL has no # correspondent Server command or is disallowed by rules given in # Allow/Disallow commands. ##This command will add /main/index.html page:#URL http://localhost/main/index.htmlUseCookie yes
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -