simulator.py

来自「Harvestman-最新版本」· Python 代码 · 共 58 行

PY
58
字号
# -- coding: utf-8""" Simulator plugin for HarvestMan. Thisplugin changes the behaviour of HarvestManto only simulate crawling without actuallydownloading anything.Author: Anand B Pillai <abpillai at gmail dot com>Created Feb 7 2007  Anand B Pillai <abpillai at gmail dot com>Copyright (C) 2007 Anand B Pillai   """__version__ = '2.0 b1'__author__ = 'Anand B Pillai'from harvestman.lib import hooksfrom harvestman.lib.common.common import *from harvestman.lib.common.macros import CONNECTOR_DATA_MODE_INMEMdef save_url(self, urlobj):    # For simulation, we need to modify the behaviour    # of save_url function in HarvestManUrlConnector class.    # This is achieved by injecting this function as a plugin    # Note that the signatures of both functions have to    # be the same.    url = urlobj.get_full_url()    self.connect(urlobj, True, self._cfg.retryfailed)    return 6def apply_plugin():    """ All plugin modules need to define this method """    # This method is expected to perform the following steps.    # 1. Register the required hook function    # 2. Get the config object and set/override any required settings    # 3. Print any informational messages.    # The first step is required, the last two are of course optional    # depending upon the required application of the plugin.        cfg = objects.config    cfg.simulate = True    cfg.localise = 0    hooks.register_plugin_function('connector:save_url_plugin', save_url)    # Turn off caching, since no files are saved    cfg.pagecache = 0    # Turn off header dumping, since no files are saved    cfg.urlheaders = 0    # For simulator, we need in-mem data mode    # since files are never saved!    cfg.datamode = CONNECTOR_DATA_MODE_INMEM    logconsole('Simulation mode turned on. Crawl will be simulated and no files will be saved.')

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?