⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 jobo.xml

📁 网络爬虫源码
💻 XML
字号:
<?xml version="1.0"?>

<!DOCTYPE JoBo SYSTEM "jobo.dtd">

<JoBo>

 <Robot>
  <!--  <AgentName>JoBo (http://www.matuschek.net/jobo.html)</AgentName> -->
  <StartReferer>http://www.matuschek.net/jobo.html</StartReferer>
  <IgnoreRobotsTxt>false</IgnoreRobotsTxt>
  <SleepTime>5</SleepTime>
  <MaxDepth>2</MaxDepth>
  <WalkToOtherHosts>false</WalkToOtherHosts>
  <Bandwidth>0</Bandwidth>
  <!-- <MaxDocumentAge>30</MaxDocumentAge> -->
  <AllowWholeHost>true</AllowWholeHost>
  <AllowWholeDomain>false</AllowWholeDomain>
  <AllowCaching>true</AllowCaching>  
  <FlexibleHostCheck>false</FlexibleHostCheck>

  <!-- Proxy configuration <Proxy>proxy.myprovider.com:80</Proxy>  -->


  <!-- robot is allowed to visit these URLs more then once -->
  <!-- (useful for forms with different parameter sets     -->
  <VisitMany>http://www.matuschek.net</VisitMany>


  <!-- form handler -->
  <FormHandler url="http://www.matuschek.net/cgi-bin/test-cgi">
   <FormField name="i" value="1"/>
   <FormField name="j" value="2"/>
   <FormField name="k" value="3"/>
  </FormHandler>
 </Robot>

 <DownloadRuleSet>
   <DownloadRule allow="true" mimeType="*/*"/>
 </DownloadRuleSet>

 <URLCheck>
   <RegExpRule allow="true" pattern="." />
 </URLCheck>

 <LocalizeLinks>false</LocalizeLinks>
 <StoreCGI>true</StoreCGI>

</JoBo>


⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -