⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 site.h

📁 Larbin互联网蜘蛛索引系统
💻 H
字号:
// Larbin// Sebastien Ailleret// 08-02-00 -> 08-02-00// This is the new structure of a site// It includes a fifo of waiting urls#ifndef SITE_H#define SITE_H#include <pthread.h>#include <time.h>#include "types.h"#include "global.h"#include "xutils/url.h"#define defSize 10/** This class is intended to make sure the sum of the * sizes of the fifo included in the different sites * are not too big */class Interval {  private:  /** Position in the interval */  uint pos;  /** Size of the interval */  uint size;  /** Condition to wait if empty */  pthread_cond_t nonFull;  /** Only one thread should manipulate an Interval at a time */  pthread_mutex_t lock; public:  /** Constructor */  Interval (uint size);  /** Destructor */  ~Interval ();  /** Ask the permission to put an url */  void putOne ();  /** How many urls can we put   * block until at least one is possible   */  uint putAll ();  /** Warn an url has been retrieved */  void getOne ();};/** this struct is the memory of what we know of a site * it is stored in a big cache (hashtable siteList in global) */class Site { private:  /* Only one thread should manipulate a site at a time   * putUrl must use it when reading or writing in, out, tab or inFifo   * others must use it when reading or writing in, out, tab or inFifo   */  pthread_mutex_t lock;  /* name of the site */  char *name;  /* port of the site */  uint port;  /** internet addr of this server */  sockaddr_in *addr;  /* Date of dns call and robots.txt fetch */  time_t lastUpdate;  /* date of last access : avoid rapid fire */  time_t lastAccess;  /** Urls waiting for being fetched */  url **tab;  /** Size of the tab of urls */  uint size;  /** where is the tab filled */  uint in, out;  /** Is this Site in a okSites or dnsSites (eg have something to fetch)   */  bool inFifo;  /** connect to this server using connection conn    * return >0 in case of success (connecting or connected), 0 otherwise   */  char getFds (Connexion *conn);  /** We have an url on the good site : Connect it */  void goodSite (url *u);  /** we've got a good dns answer   * get the robots.txt   */  void dnsOK (sockaddr_in *saddr);  /** Cannot get the inet addr   */  void dnsErr ();  /** Delete the old identity of the site */  void newId ();  /** test if a file can be fetched thanks to the robots.txt */  bool testRobots(char *file);  /** Get an url from the fifo   * resize tab if too big   * the lock must be set when calling this method   */  url *getUrl (); public:  /** Constructor : init mutex */  Site ();  /** Destructor : never used */  ~Site ();  /* forbidden paths : given by robots.txt */  Vector<char> *forbidden;  /** Put an url in the fifo   * If there are too much, put it back in UrlsInternal   */  void putUrl (url *u);  /** Put an prioritarian url in the fifo   */  void putPriorityUrl (url *u);  /** fetch the fist page in the fifo   * never perform dns calls   */  void fetchNonBlock ();  /** Init a new dns query   */  void newQuery (uint *nbCalls);  /** The dns query ended with success   */  void dnsAns (adns_answer *ans);  /** try to connect to a site   * and ask for an file   */  void connectUrl (Connexion *conn, url *u);  /** try to connect to a site   * and ask for an file   */  void connectThisUrl (Connexion *conn, url *u);  /** After a fetch, decide whether or not the site must be   * put in okSites or dnsSites   */  void putInFifo ();};#endif // SITE_H

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -