⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 www::robotrules.3

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 3
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings.  \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote.  \*(C+ will.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\.    ds -- \(*W-.    ds PI pi.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch.    ds L" "".    ds R" "".    ds C` "".    ds C' ""'br\}.el\{\.    ds -- \|\(em\|.    ds PI \(*p.    ds L" ``.    ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el       .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD.  Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\.    de IX.    tm Index:\\$1\t\\n%\t"\\$2"...    nr % 0.    rr F.\}.el \{\.    de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear.  Run.  Save yourself.  No user-serviceable parts..    \" fudge factors for nroff and troff.if n \{\.    ds #H 0.    ds #V .8m.    ds #F .3m.    ds #[ \f1.    ds #] \fP.\}.if t \{\.    ds #H ((1u-(\\\\n(.fu%2u))*.13m).    ds #V .6m.    ds #F 0.    ds #[ \&.    ds #] \&.\}.    \" simple accents for nroff and troff.if n \{\.    ds ' \&.    ds ` \&.    ds ^ \&.    ds , \&.    ds ~ ~.    ds /.\}.if t \{\.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u".    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}.    \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E.    \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'.    \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\.    ds : e.    ds 8 ss.    ds o a.    ds d- d\h'-1'\(ga.    ds D- D\h'-1'\(hy.    ds th \o'bp'.    ds Th \o'LP'.    ds ae ae.    ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "WWW::RobotRules 3".TH WWW::RobotRules 3 "2005-09-21" "perl v5.10.0" "User Contributed Perl Documentation".\" For nroff, turn off justification.  Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"WWW::RobotRules \- database of robots.txt\-derived permissions.SH "SYNOPSIS".IX Header "SYNOPSIS".Vb 2\& use WWW::RobotRules;\& my $rules = WWW::RobotRules\->new(\*(AqMOMspider/1.0\*(Aq);\&\& use LWP::Simple qw(get);\&\& {\&   my $url = "http://some.place/robots.txt";\&   my $robots_txt = get $url;\&   $rules\->parse($url, $robots_txt) if defined $robots_txt;\& }\&\& {\&   my $url = "http://some.other.place/robots.txt";\&   my $robots_txt = get $url;\&   $rules\->parse($url, $robots_txt) if defined $robots_txt;\& }\&\& # Now we can check if a URL is valid for those servers\& # whose "robots.txt" files we\*(Aqve gotten and parsed:\& if($rules\->allowed($url)) {\&     $c = get $url;\&     ...\& }.Ve.SH "DESCRIPTION".IX Header "DESCRIPTION"This module parses \fI/robots.txt\fR files as specified in\&\*(L"A Standard for Robot Exclusion\*(R", at<http://www.robotstxt.org/wc/norobots.html>Webmasters can use the \fI/robots.txt\fR file to forbid conformingrobots from accessing parts of their web site..PPThe parsed files are kept in a WWW::RobotRules object, and this objectprovides methods to check if access to a given \s-1URL\s0 is prohibited.  Thesame WWW::RobotRules object can be used for one or more parsed\&\fI/robots.txt\fR files on any number of hosts..PPThe following methods are provided:.ie n .IP "$rules = WWW::RobotRules\->new($robot_name)" 4.el .IP "\f(CW$rules\fR = WWW::RobotRules\->new($robot_name)" 4.IX Item "$rules = WWW::RobotRules->new($robot_name)"This is the constructor for WWW::RobotRules objects.  The firstargument given to \fInew()\fR is the name of the robot..ie n .IP "$rules\fR\->parse($robot_txt_url, \f(CW$content\fR, \f(CW$fresh_until)" 4.el .IP "\f(CW$rules\fR\->parse($robot_txt_url, \f(CW$content\fR, \f(CW$fresh_until\fR)" 4.IX Item "$rules->parse($robot_txt_url, $content, $fresh_until)"The \fIparse()\fR method takes as arguments the \s-1URL\s0 that was used toretrieve the \fI/robots.txt\fR file, and the contents of the file..ie n .IP "$rules\->allowed($uri)" 4.el .IP "\f(CW$rules\fR\->allowed($uri)" 4.IX Item "$rules->allowed($uri)"Returns \s-1TRUE\s0 if this robot is allowed to retrieve this \s-1URL\s0..ie n .IP "$rules\->agent([$name])" 4.el .IP "\f(CW$rules\fR\->agent([$name])" 4.IX Item "$rules->agent([$name])"Get/set the agent name. \s-1NOTE:\s0 Changing the agent name will clear the robots.txtrules and expire times out of the cache..SH "ROBOTS.TXT".IX Header "ROBOTS.TXT"The format and semantics of the \*(L"/robots.txt\*(R" file are as follows(this is an edited abstract of<http://www.robotstxt.org/wc/norobots.html> ):.PPThe file consists of one or more records separated by one or moreblank lines. Each record contains lines of the form.PP.Vb 1\&  <field\-name>: <value>.Ve.PPThe field name is case insensitive.  Text after the '#' character on aline is ignored during parsing.  This is used for comments.  Thefollowing <field\-names> can be used:.IP "User-Agent" 3.IX Item "User-Agent"The value of this field is the name of the robot the record isdescribing access policy for.  If more than one \fIUser-Agent\fR field ispresent the record describes an identical access policy for more thanone robot. At least one field needs to be present per record.  If thevalue is '*', the record describes the default access policy for anyrobot that has not not matched any of the other records..SpThe \fIUser-Agent\fR fields must occur before the \fIDisallow\fR fields.  If arecord contains a \fIUser-Agent\fR field after a \fIDisallow\fR field, thatconstitutes a malformed record.  This parser will assume that a blankline should have been placed before that \fIUser-Agent\fR field, and willbreak the record into two.  All the fields before the \fIUser-Agent\fR fieldwill constitute a record, and the \fIUser-Agent\fR field will be the firstfield in a new record..IP "Disallow" 3.IX Item "Disallow"The value of this field specifies a partial \s-1URL\s0 that is not to bevisited. This can be a full path, or a partial path; any \s-1URL\s0 thatstarts with this value will not be retrieved.SH "ROBOTS.TXT EXAMPLES".IX Header "ROBOTS.TXT EXAMPLES"The following example \*(L"/robots.txt\*(R" file specifies that no robotsshould visit any \s-1URL\s0 starting with \*(L"/cyberworld/map/\*(R" or \*(L"/tmp/\*(R":.PP.Vb 3\&  User\-agent: *\&  Disallow: /cyberworld/map/ # This is an infinite virtual URL space\&  Disallow: /tmp/ # these will soon disappear.Ve.PPThis example \*(L"/robots.txt\*(R" file specifies that no robots should visitany \s-1URL\s0 starting with \*(L"/cyberworld/map/\*(R", except the robot called\&\*(L"cybermapper\*(R":.PP.Vb 2\&  User\-agent: *\&  Disallow: /cyberworld/map/ # This is an infinite virtual URL space\&\&  # Cybermapper knows where to go.\&  User\-agent: cybermapper\&  Disallow:.Ve.PPThis example indicates that no robots should visit this site further:.PP.Vb 3\&  # go away\&  User\-agent: *\&  Disallow: /.Ve.PPThis is an example of a malformed robots.txt file..PP.Vb 10\&  # robots.txt for ancientcastle.example.com\&  # I\*(Aqve locked myself away.\&  User\-agent: *\&  Disallow: /\&  # The castle is your home now, so you can go anywhere you like.\&  User\-agent: Belle\&  Disallow: /west\-wing/ # except the west wing!\&  # It\*(Aqs good to be the Prince...\&  User\-agent: Beast\&  Disallow:.Ve.PPThis file is missing the required blank lines between records.However, the intention is clear..SH "SEE ALSO".IX Header "SEE ALSO"LWP::RobotUA, WWW::RobotRules::AnyDBM_File

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -