perlport.pod

来自「ARM上的如果你对底层感兴趣」· POD 代码 · 共 1,462 行 · 第 1/4 页
POD
1,462 行
=head1 NAME

perlport - Writing portable Perl


=head1 DESCRIPTION

Perl runs on a variety of operating systems.  While most of them share
a lot in common, they also have their own very particular and unique
features.

This document is meant to help you to find out what constitutes portable
Perl code, so that once you have made your decision to write portably,
you know where the lines are drawn, and you can stay within them.

There is a tradeoff between taking full advantage of B<a> particular type
of computer, and taking advantage of a full B<range> of them.  Naturally,
as you make your range bigger (and thus more diverse), the common
denominators drop, and you are left with fewer areas of common ground in
which you can operate to accomplish a particular task.  Thus, when you
begin attacking a problem, it is important to consider which part of the
tradeoff curve you want to operate under. Specifically, whether it is
important to you that the task that you are coding needs the full
generality of being portable, or if it is sufficient to just get the job
done.  This is the hardest choice to be made.  The rest is easy, because
Perl provides lots of choices, whichever way you want to approach your
problem.

Looking at it another way, writing portable code is usually about
willfully limiting your available choices.  Naturally, it takes discipline
to do that.

Be aware of two important points:

=over 4

=item Not all Perl programs have to be portable

There is no reason why you should not use Perl as a language to glue Unix
tools together, or to prototype a Macintosh application, or to manage the
Windows registry.  If it makes no sense to aim for portability for one
reason or another in a given program, then don't bother.

=item The vast majority of Perl B<is> portable

Don't be fooled into thinking that it is hard to create portable Perl
code.  It isn't.  Perl tries its level-best to bridge the gaps between
what's available on different platforms, and all the means available to
use those features.  Thus almost all Perl code runs on any machine
without modification.  But there I<are> some significant issues in
writing portable code, and this document is entirely about those issues.

=back

Here's the general rule: When you approach a task that is commonly done
using a whole range of platforms, think in terms of writing portable
code.  That way, you don't sacrifice much by way of the implementation
choices you can avail yourself of, and at the same time you can give
your users lots of platform choices.  On the other hand, when you have to
take advantage of some unique feature of a particular platform, as is
often the case with systems programming (whether for Unix, Windows,
S<Mac OS>, VMS, etc.), consider writing platform-specific code.

When the code will run on only two or three operating systems, then you
may only need to consider the differences of those particular systems.
The important thing is to decide where the code will run, and to be
deliberate in your decision.

The material below is separated into three main sections: main issues of
portability (L<"ISSUES">, platform-specific issues (L<"PLATFORMS">, and
builtin perl functions that behave differently on various ports
(L<"FUNCTION IMPLEMENTATIONS">.

This information should not be considered complete; it includes possibly
transient information about idiosyncrasies of some of the ports, almost
all of which are in a state of constant evolution.  Thus this material
should be considered a perpetual work in progress
(E<lt>IMG SRC="yellow_sign.gif" ALT="Under Construction"E<gt>).




=head1 ISSUES

=head2 Newlines

In most operating systems, lines in files are separated with newlines.
Just what is used as a newline may vary from OS to OS.  Unix
traditionally uses C<\012>, one kind of Windows I/O uses C<\015\012>,
and S<Mac OS> uses C<\015>.

Perl uses C<\n> to represent the "logical" newline, where what
is logical may depend on the platform in use.  In MacPerl, C<\n>
always means C<\015>.  In DOSish perls, C<\n> usually means C<\012>, but
when accessing a file in "text" mode, STDIO translates it to (or from)
C<\015\012>.

Due to the "text" mode translation, DOSish perls have limitations
of using C<seek> and C<tell> when a file is being accessed in "text"
mode.  Specifically, if you stick to C<seek>-ing to locations you got
from C<tell> (and no others), you are usually free to use C<seek> and
C<tell> even in "text" mode.  In general, using C<seek> or C<tell> or
other file operations that count bytes instead of characters, without
considering the length of C<\n>, may be non-portable.  If you use
C<binmode> on a file, however, you can usually use C<seek> and C<tell>
with arbitrary values quite safely.

A common misconception in socket programming is that C<\n> eq C<\012>
everywhere.  When using protocols such as common Internet protocols,
C<\012> and C<\015> are called for specifically, and the values of
the logical C<\n> and C<\r> (carriage return) are not reliable.

    print SOCKET "Hi there, client!\r\n";      # WRONG
    print SOCKET "Hi there, client!\015\012";  # RIGHT

[NOTE: this does not necessarily apply to communications that are
filtered by another program or module before sending to the socket; the
the most popular EBCDIC webserver, for instance, accepts C<\r\n>,
which translates those characters, along with all other
characters in text streams, from EBCDIC to ASCII.]

However, using C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious
and unsightly, as well as confusing to those maintaining the code.  As
such, the C<Socket> module supplies the Right Thing for those who want it.

    use Socket qw(:DEFAULT :crlf);
    print SOCKET "Hi there, client!$CRLF"      # RIGHT

When reading I<from> a socket, remember that the default input record
separator (C<$/>) is C<\n>, but code like this should recognize C<$/> as
C<\012> or C<\015\012>:

    while (<SOCKET>) {
        # ...
    }

Better:

    use Socket qw(:DEFAULT :crlf);
    local($/) = LF;      # not needed if $/ is already \012

    while (<SOCKET>) {
        s/$CR?$LF/\n/;   # not sure if socket uses LF or CRLF, OK
    #   s/\015?\012/\n/; # same thing
    }

And this example is actually better than the previous one even for Unix
platforms, because now any C<\015>'s (C<\cM>'s) are stripped out
(and there was much rejoicing).


=head2 Numbers endianness and Width

Different CPUs store integers and floating point numbers in different
orders (called I<endianness>) and widths (32-bit and 64-bit being the
most common).  This affects your programs if they attempt to transfer
numbers in binary format from a CPU architecture to another over some
channel: either 'live' via network connections or storing the numbers
to secondary storage such as a disk file.

Conflicting storage orders make utter mess out of the numbers: if a
little-endian host (Intel, Alpha) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, MIPS, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal).  To avoid this problem in network
(socket) connections use the C<pack()> and C<unpack()> formats C<"n">
and C<"N">, the "network" orders, they are guaranteed to be portable.

Different widths can cause truncation even between platforms of equal
endianness: the platform of shorter width loses the upper parts of the
number.  There is no good solution for this problem except to avoid
transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways: either
transfer and store numbers always in text format, instead of raw
binary, or consider using modules like C<Data::Dumper> (included in
the standard distribution as of Perl 5.005) and C<Storable>.

=head2 Files

Most platforms these days structure files in a hierarchical fashion.
So, it is reasonably safe to assume that any platform supports the
notion of a "path" to uniquely identify a file on the system.  Just
how that path is actually written, differs.

While they are similar, file path specifications differ between Unix,
Windows, S<Mac OS>, OS/2, VMS, S<RISC OS> and probably others.  Unix,
for example, is one of the few OSes that has the idea of a single root
directory.

VMS, Windows, and OS/2 can work similarly to Unix with C</> as path
separator, or in their own idiosyncratic ways (such as having several
root directories and various "unrooted" device files such NIL: and
LPT:).

S<Mac OS> uses C<:> as a path separator instead of C</>.

C<RISC OS> perl can emulate Unix filenames with C</> as path
separator, or go native and use C<.> for path separator and C<:> to
signal filing systems and disc names.

As with the newline problem above, there are modules that can help.  The
C<File::Spec> modules provide methods to do the Right Thing on whatever
platform happens to be running the program.

    use File::Spec;
    chdir(File::Spec->updir());        # go up one directory
    $file = File::Spec->catfile(
        File::Spec->curdir(), 'temp', 'file.txt'
    );
    # on Unix and Win32, './temp/file.txt'
    # on Mac OS, ':temp:file.txt'

File::Spec is available in the standard distribution, as of version
5.004_05.

In general, production code should not have file paths hardcoded; making
them user supplied or from a configuration file is better, keeping in mind
that file path syntax varies on different machines.

This is especially noticeable in scripts like Makefiles and test suites,
which often assume C</> as a path separator for subdirectories.

Also of use is C<File::Basename>, from the standard distribution, which
splits a pathname into pieces (base filename, full path to directory,
and file suffix).

Even when on a single platform (if you can call UNIX a single
platform), remember not to count on the existence or the contents of
system-specific files, like F</etc/passwd>, F</etc/sendmail.conf>, or
F</etc/resolv.conf>.  For example the F</etc/passwd> may exist but it
may not contain the encrypted passwords because the system is using
some form of enhanced security-- or it may not contain all the
accounts because the system is using NIS.  If code does need to rely
on such a file, include a description of the file and its format in
the code's documentation, and make it easy for the user to override
the default location of the file.

Do not have two files of the same name with different case, like
F<test.pl> and <Test.pl>, as many platforms have case-insensitive
filenames.  Also, try not to have non-word characters (except for C<.>)
in the names, and keep them to the 8.3 convention, for maximum
portability.

Likewise, if using C<AutoSplit>, try to keep the split functions to
8.3 naming and case-insensitive conventions; or, at the very least,
make it so the resulting files have a unique (case-insensitively)
first 8 characters.

Don't assume C<E<lt>> won't be the first character of a filename.  Always
use C<E<gt>> explicitly to open a file for reading:

    open(FILE, "<$existing_file") or die $!;


=head2 System Interaction

Not all platforms provide for the notion of a command line, necessarily.
These are usually platforms that rely on a Graphical User Interface (GUI)
for user interaction.  So a program requiring command lines might not work
everywhere.  But this is probably for the user of the program to deal
with.

Some platforms can't delete or rename files that are being held open by
the system.  Remember to C<close> files when you are done with them.
Don't C<unlink> or C<rename> an open file.  Don't C<tie> to or C<open> a
file that is already tied to or opened; C<untie> or C<close> first.

Don't open the same file more than once at a time for writing, as some
operating systems put mandatory locks on such files.

Don't count on a specific environment variable existing in C<%ENV>.
Don't count on C<%ENV> entries being case-sensitive, or even
case-preserving.

Don't count on signals.

Don't count on filename globbing.  Use C<opendir>, C<readdir>, and
C<closedir> instead.

Don't count on per-program environment variables, or per-program current
directories.


=head2 Interprocess Communication (IPC)

In general, don't directly access the system in code that is meant to be
portable.  That means, no C<system>, C<exec>, C<fork>, C<pipe>, C<``>,
C<qx//>, C<open> with a C<|>, nor any of the other things that makes being
a Unix perl hacker worth being.

Commands that launch external processes are generally supported on
most platforms (though many of them do not support any type of forking),
but the problem with using them arises from what you invoke with them.
External tools are often named differently on different platforms, often
not available in the same location, often accept different arguments,
often behave differently, and often represent their results in a
platform-dependent way.  Thus you should seldom depend on them to produce
consistent results.

One especially common bit of Perl code is opening a pipe to sendmail:

    open(MAIL, '|/usr/lib/sendmail -t') or die $!;

This is fine for systems programming when sendmail is known to be
available.  But it is not fine for many non-Unix systems, and even
some Unix systems that may not have sendmail installed.  If a portable
solution is needed, see the C<Mail::Send> and C<Mail::Mailer> modules
in the C<MailTools> distribution.  C<Mail::Mailer> provides several
mailing methods, including mail, sendmail, and direct SMTP
(via C<Net::SMTP>) if a mail transfer agent is not available.

The rule of thumb for portable code is: Do it all in portable Perl, or
use a module (that may internally implement it with platform-specific
code, but expose a common interface).

The UNIX System V IPC (C<msg*(), sem*(), shm*()>) is not available
even in all UNIX platforms.

=head2 External Subroutines (XS)

XS code, in general, can be made to work with any platform; but dependent
libraries, header files, etc., might not be readily available or
portable, or the XS code itself might be platform-specific, just as Perl
code might be.  If the libraries and headers are portable, then it is
normally reasonable to make sure the XS code is portable, too.

There is a different kind of portability issue with writing XS
code: availability of a C compiler on the end-user's system.  C brings
with it its own portability issues, and writing XS code will expose you to
some of those.  Writing purely in perl is a comparatively easier way to
achieve portability.


=head2 Standard Modules

In general, the standard modules work across platforms.  Notable
exceptions are C<CPAN.pm> (which currently makes connections to external
programs that may not be available), platform-specific modules (like
C<ExtUtils::MM_VMS>), and DBM modules.

There is no one DBM module that is available on all platforms.
C<SDBM_File> and the others are generally available on all Unix and DOSish
ports, but not in MacPerl, where only C<NBDM_File> and C<DB_File> are
available.

The good news is that at least some DBM module should be available, and
C<AnyDBM_File> will use whichever module it can find.  Of course, then
the code needs to be fairly strict, dropping to the lowest common
denominator (e.g., not exceeding 1K for each record).


=head2 Time and Date

The system's notion of time of day and calendar date is controlled in
widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>,
and even if it is, don't assume that you can control the timezone through
that variable.

Don't assume that the epoch starts at 00:00:00, January 1, 1970,
because that is OS-specific.  Better to store a date in an unambiguous
representation.  The ISO 8601 standard defines YYYY-MM-DD as the date
format.  A text representation (like C<1 Jan 1970>) can be easily
converted into an OS-specific value using a module like
C<Date::Parse>.  An array of values, such as those returned by
C<localtime>, can be converted to an OS-specific representation using
C<Time::Local>.
perlport.pod - 源码说明

本页面展示了「ARM上的如果你对底层感兴趣」中的 perlport.pod 源码文件，采用 POD 编程语言编写，共 1,462 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与ARM相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?