⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme

📁 harvest是一个下载html网页得机器人
💻
字号:
About:=====This is a prototype of a Bibliometric Ranking for Harvest.It will return the number of links pointing to a document. This can beused to evaluate the importance or relevance of a document.This isn't optimized, yet. See the todo list at the end of thisdocument for what needs to be evaluated until it will yield bestresults.How to use:==========Check the configurable section of create_db.pl and count_link.pl andset some variables to customize them, when necessary. Edit the brokerpath in glimpseindex.create_db.pl will extract url-references from the broker's SOIF files,build a reversed link list, a list of URLs in the broker, a cachedatabase and a cleaned up reversed link list, which will only containlist of URLs which actually can be found. It will take around an hourfor around 200 000 objects on my test machine.If you have built your database, you can use count_link.pl to querythe database.You can use count_db.pl like this# count_db.pl http://www.your.site.com/test/data.htmlto get the number of documents pointing to this document.Replace $HARVEST_HOME/lib/broker/glimpseindex with the glimpseindex inthis directory and copy create_db.pl, normal.pm and count_link.pl to$HARVEST_HOME/lib/broker directory to build necessary database andthen modify search.cgi's ranking function to query this database.Scripts:=======- create_db.pl:   create databases.- count_link.pl:  return number of urls pointing to a link.Additional Files:================- glimpseindex:   replacement for glimpseindex in                  $HARVEST_HOME/lib/broker/glimpseindex.- normal.pm:      normalizes URLs.- test directory contains some regression tests for various functions.Todo:====- currently only direct links to a document contributes to the ranking  information. It may be useful to check if indirect hits also should  influence the ranking.- implement weight of a link pointing to a document and weight of a  document from a site pointing to a document.- modify search.cgi to use count_db.plkjl/29dec2002

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -