📄 url.pm
字号:
=item This is an edited extract from a URI specification:
The printability requirement has been met by specifying a safe set of
characters, and a general escaping scheme for encoding "unsafe"
characters. This "safe" set is suitable, for example, for use in
electronic mail. This is the canonical form of a URI.
There is a conflict between the need to be able to represent many
characters including spaces within a URI directly, and the need to be
able to use a URI in environments which have limited character sets
or in which certain characters are prone to corruption. This conflict
has been resolved by use of an hexadecimal escaping method which may
be applied to any characters forbidden in a given context. When URLs
are moved between contexts, the set of characters escaped may be
enlarged or reduced unambiguously. The canonical form for URIs has
all white spaces encoded.
=item Notes:
A URL string I<must>, by definition, consist of escaped
components. Complete URLs are always escaped.
The components of a URL string must be I<individually> escaped. Each
component of a URL may have a separate requirements regarding what
must be escaped, and those requirements are also dependent on the URL
scheme.
Never escape an already escaped component string.
=back
This implementation expects an escaped URL string to be passed to
new() and will return a fully escaped URL string from as_string()
and full_path().
Individual components can be manipulated in unescaped or escaped
form. The following methods return/accept unescaped strings:
scheme path
user params
password query
host frag
port
The following methods return/accept partial I<escaped> strings:
netloc eparams
epath equery
I<Partial escaped> means that only reserved characters
(i.e. ':', '@', '/', ';', '?', '=', '&' in addition to '%', '.' and '#')
needs to be escaped when they are to be treated as normal characters.
I<Fully escaped> means that all unsafe characters are escaped. Unsafe
characters are all all control characters (%00-%1F and %7F), all 8-bit
characters (%80-%FF) as well
as '{', '}', '|', '\', '^', '[', ']' '`', '"', '<' and '>'.
Note that the character '~' is B<not> considered
unsafe by this library as it is common practice to use it to reference
personal home pages, but it is still unsafe according to RFC 1738.
=head1 ADDING NEW URL SCHEMES
New URL schemes or alternative implementations for existing schemes
can be added to your own code. To create a new scheme class use code
like:
package MYURL::foo;
@ISA = (URI::URL::implementor()); # inherit from generic scheme
The 'URI::URL::implementor()' function call with no parameters returns
the name of the class which implements the generic URL scheme
behaviour (typically C<URI::URL::_generic>). All hierarchical schemes
should be derived from this class.
Your class can then define overriding methods (e.g., new(), _parse()
as required).
To register your new class as the implementor for a specific scheme
use code like:
URI::URL::implementor('x-foo', 'MYURL::foo');
Any new URL created for scheme 'x-foo' will be implemented by your
C<MYURL::foo> class. Existing URLs will not be affected.
=head1 FUNCTIONS
=over 3
=item $url = URI::URL->new( $url_string [, $base_url] )
This is the object constructor. It will create a new URI::URL object,
initialized from the URL string.
=item $url = URI::URL->newlocal($path);
Returns an URL object that denotes a path within the local filesystem.
Paths not starting with '/' are interpreted relative to the current
working directory. This constructor always return an absolute 'file'
URL.
=item $url = url($url_string, [, $base_url])
Alternative constructor function. The url() function is exported by
the URI::URL module and is easier both to type and read than calling
C<URI::URL->new> directly. Useful for constructs like this:
$h = url($str)->host;
This function is just a wrapper for URI::URL->new.
=item URI::URL::strict($bool)
If strict is true then we croak on errors. The function returns the
previous value.
=item URI::URL::implementor([$scheme, [$class]])
Use this function to get or set implementor class for a scheme.
Returns '' if specified scheme is not supported. Returns generic URL
class if no scheme specified.
=back
=head1 METHODS
This section describes the methods available for an URI::URL object.
Note that some URL schemes will disallow some of these methods and
will croak if they are used. Some URL schemes add additional methods
that are described in the sections to follow.
Attribute access methods marked with (*) can take an optional argument
to set the value of the attribute, and they always return the old
value.
=over 3
=item $url->abs([$base])
The abs() method attempts to return a new absolute URI::URL object
for a given URL. In order to convert a relative URL into an absolute
one, a I<base> URL is required. You can associate a default base with a
URL either by passing a I<base> to the new() constructor when a
URI::URL is created or using the base() method on the object later.
Alternatively you can specify a one-off base as a parameter to the
abs() method.
The rel() method will do the opposite transformation.
Some older parsers used to allow the scheme name to be present in the
relative URL if it was the same as the base URL scheme. RFC1808 says
that this should be avoided, but you can enable this old behaviour by
setting the $URI::URL::ABS_ALLOW_RELATIVE_SCHEME variable to a TRUE
value. The difference is demonstrated by the following examples:
url("http:foo")->abs("http://host/a/b") ==> "http:foo"
local $URI::URL::ABS_ALLOW_RELATIVE_SCHEME = 1;
url("http:foo")->abs("http://host/a/b") ==> "http:/host/a/foo"
You can also have the abs() method ignore if there is too many ".."
segments in the relative URL by setting
$URI::URL::ABS_REMOTE_LEADING_DOTS to a TRUE value. The difference is
demonstrated by the following examples:
url("../../../foo")->abs("http://host/a/b") ==> "http://host/../../foo"
local $URI::URL::ABS_REMOTE_LEADING_DOTS = 1;
url("../../../foo")->abs("http://host/a/b") ==> "http://host/foo"
=item $url->as_string
Returns a string representing the URL in its canonical form. All
unsafe characters will be escaped. This method is overloaded as the
perl "stringify" operator, which means that URLs can be used as
strings in many contexts.
=item $url->base (*)
Get/set the base URL associated with the current URI::URL object. The
base URL matters when you call the abs() method.
=item $url->clone
Returns a copy of the current URI::URL object.
=item $url->crack
Return a 9 element array with the following content:
0: $url->scheme *)
1: $url->user
2: $url->password
3: $url->host
4: $url->port
5: $url->epath
6: $url->eparams
7: $url->equery
8: $url->frag
All elements except I<scheme> will be undefined if the corresponding
URL part is not available.
B<Note:> The scheme (first element) returned by crack will aways be
defined. This is different from what the $url->scheme returns, since
it will return I<undef> for relative URLs.
=item $url->default_port
Returns the default port number for the URL scheme that the URI::URL
belongs too.
=item $url->eparams (*)
Get/set the URL parameters in escaped form.
=item $url->epath (*)
Get/set the URL path in escaped form.
=item $url->eq($other_url)
Compare two URLs to decide if they match or not. The rules for how
comparison is made varies for different parts of the URLs; scheme and
netloc comparison is case-insensitive, and escaped chars match their
%XX encoding unless they are "reserved" or "unsafe".
=item $url->equery (*)
Get/set the URL query string in escaped form.
=item $url->full_path
Returns the string "/path;params?query". This is the string that is
passed to a remote server in order to access the document.
=item $url->frag (*)
Get/set the fragment (unescaped)
=item $url->host (*)
Get/set the host (unescaped)
=item $url->netloc (*)
Get/set the network location in escaped form. Setting the network
location will affect 'user', 'password', 'host' and 'port'.
=item $url->params (*)
Get/set the URL parameters (unescaped)
=item $url->password (*)
Get/set the password (unescaped)
=item $url->path (*)
Get/set the path (unescaped). This method will croak if any of the
path components in the return value contain the "/" character. You
should use the epath() method to be safe.
=item $url->path_components (*)
Get/set the path using a list of unescaped path components. The
return value will loose the distinction beween '.' and '%2E'. When
setting a value, a '.' is converted to be a literal '.' and is
therefore encoded as '%2E'.
=item $url->port (*)
Get/set the network port (unescaped)
=item $url->rel([$base])
Return a relative URL if possible. This is the opposite of what the
abs() method does. For instance:
url("http://www.math.uio.no/doc/mail/top.html",
"http://www.math.uio.no/doc/linux/")->rel
will return a relative URL with path set to "../mail/top.html" and
with the same base as the original URL.
If the original URL already is relative or the scheme or netloc does
not match the base, then a copy of the original URL is returned.
=item $url->print_on(*FILEHANDLE);
Prints a verbose presentation of the contents of the URL object to
the specified file handle (default STDERR). Mainly useful for
debugging.
=item $url->scheme (*)
Get/set the scheme for the URL.
=item $url->query (*)
Get/set the query string (unescaped). This method will croak if the
string returned contains both '+' and '%2B' or '=' together with '%3D'
or '%26'. You should use the equery() method to be safe.
=item $url->user (*)
Get/set the URL user name (unescaped)
=back
=head1 HTTP METHODS
For I<http> URLs you may also access the query string using the
keywords() and the query_form() methods. Both will croak if the query
is not of the correct format. The encodings look like this:
word1+word2+word3.. # keywords
key1=val1&key2=val2... # query_form
Note: These functions does not return the old value when they are used
to set a value of the query string.
=over 3
=item $url->keywords (*)
The keywords() method returns a list of unescaped strings. The method
can also be used to set the query string by passing in the keywords as
individual arguments to the method.
=item $url->query_form (*)
The query_form() method return a list of unescaped key/value pairs.
If you assign the return value to a hash you might loose some values
if the key is repeated (which it is allowed to do).
This method can also be used to set the query sting of the URL like this:
$url->query_form(foo => 'bar', foo => 'baz', equal => '=');
If the value part of a key/value pair is a reference to an array, then
it will be converted to separate key/value pairs for each value. This
means that these two calls are equal:
$url->query_form(foo => 'bar', foo => 'baz');
$url->query_form(foo => ['bar', 'baz']);
=back
=head1 FILE METHODS
The I<file> URLs implement the local_path() method that returns a path
suitable for access to files within the current filesystem. These
methods can B<not> be used to set the path of the URL.
=over 3
=item $url->local_path
This method is really just an alias for one of the methods below
depending on what system you run on.
=item $url->unix_path
Returns a path suitable for use on a Unix system. This method will
croak if any of the path segments contains a "/" or a NULL character.
=item $url->dos_path
Returns a path suitable for use on a MS-DOS or MS-Windows system.
=item $url->mac_path
Returns a path suitable for use on a Macintosh system.
=item $url->vms_path
Returns a path suitable for use on a VMS system. VMS is a trademark
of Digital.
=back
=head1 GOPHER METHODS
The methods access the parts that are specific for the gopher URLs.
These methods access different parts of the $url->path.
=over 3
=item $url->gtype (*)
=item $url->selector (*)
=item $url->search (*)
=item $url->string (*)
=back
=head1 NEWS METHODS
=over 3
=item $url->group (*)
=item $url->article (*)
=back
=head1 WAIS METHODS
The methods access the parts that are specific for the wais URLs.
These methods access different parts of the $url->path.
=over 3
=item $url->database (*)
=item $url->wtype (*)
=item $url->wpath (*)
=back
=head1 MAILTO METHODS
=over 3
=item $url->address (*)
The mail address can also be accessed with the netloc() method.
=back
=head1 WHAT A URL IS NOT
URL objects do not, and should not, know how to 'get' or 'put' the
resources they specify locations for, anymore than a postal address
'knows' anything about the postal system. The actual access/transfer
should be achieved by some form of transport agent class (see
L<LWP::UserAgent>). The agent class can use the URL class, but should
not be a subclass of it.
=head1 AUTHORS / ACKNOWLEDGMENTS
This module is (distantly) based on the C<wwwurl.pl> code in the
libwww-perl distribution developed by Roy Fielding
<fielding@ics.uci.edu>, as part of the Arcadia project at the
University of California, Irvine, with contributions from Brooks
Cutter.
Gisle Aas <aas@sn.no>, Tim Bunce <Tim.Bunce@ig.co.uk>, Roy Fielding
<fielding@ics.uci.edu> and Martijn Koster <m.koster@webcrawler.com>
(in English alphabetical order) have collaborated on the complete
rewrite for Perl 5, with input from other people on the libwww-perl
mailing list.
If you have any suggestions, bug reports, fixes, or enhancements, send
them to the libwww-perl mailing list at <libwww-perl@ics.uci.edu>.
=head1 COPYRIGHT
Copyright 1995-1997 Gisle Aas.
Copyright 1995 Martijn Koster.
This program is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
=cut
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -