📄 perlebcdic.pod
字号:
=head2 C RTLThe OS/390 C run time library provides _atoe() and _etoa() functions.=head1 OPERATOR DIFFERENCESThe C<..> range operator treats certain character ranges with care on EBCDIC machines. For example the following arraywill have twenty six elements on either an EBCDIC machineor an ASCII machine: @alphabet = ('A'..'Z'); # $#alphabet == 25The bitwise operators such as & ^ | may return different resultswhen operating on string or character data in a perl program running on an EBCDIC machine than when run on an ASCII machine. Here isan example adapted from the one in L<perlop>: # EBCDIC-based examples print "j p \n" ^ " a h"; # prints "JAPH\n" print "JA" | " ph\n"; # prints "japh\n" print "JAPH\nJunk" & "\277\277\277\277\277"; # prints "japh\n"; print 'p N$' ^ " E<H\n"; # prints "Perl\n";An interesting property of the 32 C0 control charactersin the ASCII table is that they can "literally" be constructedas control characters in perl, e.g. C<(chr(0) eq "\c@")> C<(chr(1) eq "\cA")>, and so on. Perl on EBCDIC machines has been ported to take "\c@" to chr(0) and "\cA" to chr(1) as well, but thethirty three characters that result depend on which code page you areusing. The table below uses the character names from the previous table but with substitutions such as s/START OF/S.O./; s/END OF /E.O./; s/TRANSMISSION/TRANS./; s/TABULATION/TAB./; s/VERTICAL/VERT./; s/HORIZONTAL/HORIZ./; s/DEVICE CONTROL/D.C./; s/SEPARATOR/SEP./; s/NEGATIVE ACKNOWLEDGE/NEG. ACK./;. The POSIX-BC and 1047 sets areidentical throughout this range and differ from the 0037 set at only one spot (21 decimal). Note that the C<LINE FEED> charactermay be generated by "\cJ" on ASCII machines but by "\cU" on 1047 or POSIX-BC machines and cannot be generated as a C<"\c.letter."> control character on 0037 machines. Note also that "\c\\" maps to two charactersnot one. chr ord 8859-1 0037 1047 && POSIX-BC ------------------------------------------------------------------------ "\c?" 127 <DELETE> " " ***>< "\c@" 0 <NULL> <NULL> <NULL> ***>< "\cA" 1 <S.O. HEADING> <S.O. HEADING> <S.O. HEADING> "\cB" 2 <S.O. TEXT> <S.O. TEXT> <S.O. TEXT> "\cC" 3 <E.O. TEXT> <E.O. TEXT> <E.O. TEXT> "\cD" 4 <E.O. TRANS.> <C1 28> <C1 28> "\cE" 5 <ENQUIRY> <HORIZ. TAB.> <HORIZ. TAB.> "\cF" 6 <ACKNOWLEDGE> <C1 6> <C1 6> "\cG" 7 <BELL> <DELETE> <DELETE> "\cH" 8 <BACKSPACE> <C1 23> <C1 23> "\cI" 9 <HORIZ. TAB.> <C1 13> <C1 13> "\cJ" 10 <LINE FEED> <C1 14> <C1 14> "\cK" 11 <VERT. TAB.> <VERT. TAB.> <VERT. TAB.> "\cL" 12 <FORM FEED> <FORM FEED> <FORM FEED> "\cM" 13 <CARRIAGE RETURN> <CARRIAGE RETURN> <CARRIAGE RETURN> "\cN" 14 <SHIFT OUT> <SHIFT OUT> <SHIFT OUT> "\cO" 15 <SHIFT IN> <SHIFT IN> <SHIFT IN> "\cP" 16 <DATA LINK ESCAPE> <DATA LINK ESCAPE> <DATA LINK ESCAPE> "\cQ" 17 <D.C. ONE> <D.C. ONE> <D.C. ONE> "\cR" 18 <D.C. TWO> <D.C. TWO> <D.C. TWO> "\cS" 19 <D.C. THREE> <D.C. THREE> <D.C. THREE> "\cT" 20 <D.C. FOUR> <C1 29> <C1 29> "\cU" 21 <NEG. ACK.> <C1 5> <LINE FEED> *** "\cV" 22 <SYNCHRONOUS IDLE> <BACKSPACE> <BACKSPACE> "\cW" 23 <E.O. TRANS. BLOCK> <C1 7> <C1 7> "\cX" 24 <CANCEL> <CANCEL> <CANCEL> "\cY" 25 <E.O. MEDIUM> <E.O. MEDIUM> <E.O. MEDIUM> "\cZ" 26 <SUBSTITUTE> <C1 18> <C1 18> "\c[" 27 <ESCAPE> <C1 15> <C1 15> "\c\\" 28 <FILE SEP.>\ <FILE SEP.>\ <FILE SEP.>\ "\c]" 29 <GROUP SEP.> <GROUP SEP.> <GROUP SEP.> "\c^" 30 <RECORD SEP.> <RECORD SEP.> <RECORD SEP.> ***>< "\c_" 31 <UNIT SEP.> <UNIT SEP.> <UNIT SEP.> ***><=head1 FUNCTION DIFFERENCES=over 8=item chr()chr() must be given an EBCDIC code number argument to yield a desired character return value on an EBCDIC machine. For example: $CAPITAL_LETTER_A = chr(193);=item ord()ord() will return EBCDIC code number values on an EBCDIC machine.For example: $the_number_193 = ord("A");=item pack()The c and C templates for pack() are dependent upon character set encoding. Examples of usage on EBCDIC include: $foo = pack("CCCC",193,194,195,196); # $foo eq "ABCD" $foo = pack("C4",193,194,195,196); # same thing $foo = pack("ccxxcc",193,194,195,196); # $foo eq "AB\0\0CD"=item print()One must be careful with scalars and strings that are passed toprint that contain ASCII encodings. One common placefor this to occur is in the output of the MIME type header forCGI script writing. For example, many perl programming guides recommend something similar to: print "Content-type:\ttext/html\015\012\015\012"; # this may be wrong on EBCDICUnder the IBM OS/390 USS Web Server for example you should insteadwrite that as: print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et aliaThat is because the translation from EBCDIC to ASCII is doneby the web server in this case (such code will not be appropriate forthe Macintosh however). Consult your web server's documentation for further details.=item printf()The formats that can convert characters to numbers and vice versawill be different from their ASCII counterparts when executedon an EBCDIC machine. Examples include: printf("%c%c%c",193,194,195); # prints ABC=item sort()EBCDIC sort results may differ from ASCII sort results especially for mixed case strings. This is discussed in more detail below.=item sprintf()See the discussion of printf() above. An example of the useof sprintf would be: $CAPITAL_LETTER_A = sprintf("%c",193);=item unpack()See the discussion of pack() above.=back=head1 REGULAR EXPRESSION DIFFERENCESAs of perl 5.005_03 the letter range regular expression such as [A-Z] and [a-z] have been especially coded to not pick up gap characters. For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX> that lie between I and J would not be matched by the regular expression range C</[H-K]/>. If you do want to match the alphabet gap characters in a single octet regular expression try matching the hex or octal code such as C</\313/> on EBCDIC or C</\364/> on ASCII machines to have your regular expression match C<o WITH CIRCUMFLEX>.Another construct to be wary of is the inappropriate use of hex oroctal constants in regular expressions. Consider the followingset of subs: sub is_c0 { my $char = substr(shift,0,1); $char =~ /[\000-\037]/; } sub is_print_ascii { my $char = substr(shift,0,1); $char =~ /[\040-\176]/; } sub is_delete { my $char = substr(shift,0,1); $char eq "\177"; } sub is_c1 { my $char = substr(shift,0,1); $char =~ /[\200-\237]/; } sub is_latin_1 { my $char = substr(shift,0,1); $char =~ /[\240-\377]/; }The above would be adequate if the concern was only with numeric code points.However, the concern may be with characters rather than code points and on an EBCDIC machine it may be desirable for constructs such as C<if (is_print_ascii("A")) {print "A is a printable character\n";}> to printout the expected message. One way to represent the above collectionof character classification subs that is capable of working across thefour coded character sets discussed in this document is as follows: sub Is_c0 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\000-\037]/; } if (ord('^')==176) { # 37 return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/; } if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/; } } sub Is_print_ascii { my $char = substr(shift,0,1); $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/; } sub Is_delete { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char eq "\177"; } else { # ebcdic return $char eq "\007"; } } sub Is_c1 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\200-\237]/; } if (ord('^')==176) { # 37 return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; } if (ord('^')==95) { # 1047 return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; } if (ord('^')==106) { # posix-bc return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/; } } sub Is_latin_1 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\240-\377]/; } if (ord('^')==176) { # 37 return $char =~ /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==95) { # 1047 return $char =~ /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==106) { # posix-bc return $char =~ /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/; } }Note however that only the C<Is_ascii_print()> sub is really independent of coded character set. Another way to write C<Is_latin_1()> would be to use the characters in the range explicitly: sub Is_latin_1 { my $char = substr(shift,0,1); $char =~ /[牎ⅲぅΗī
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -