📄 perlebcdic.pod
字号:
machines and cannot be generated as a C<"\c.letter."> control character on 0037 machines. Note also that "\c\\" maps to two charactersnot one. chr ord 8859-1 0037 1047 && POSIX-BC ------------------------------------------------------------------------ "\c?" 127 <DELETE> " " ***>< "\c@" 0 <NULL> <NULL> <NULL> ***>< "\cA" 1 <S.O. HEADING> <S.O. HEADING> <S.O. HEADING> "\cB" 2 <S.O. TEXT> <S.O. TEXT> <S.O. TEXT> "\cC" 3 <E.O. TEXT> <E.O. TEXT> <E.O. TEXT> "\cD" 4 <E.O. TRANS.> <C1 28> <C1 28> "\cE" 5 <ENQUIRY> <HORIZ. TAB.> <HORIZ. TAB.> "\cF" 6 <ACKNOWLEDGE> <C1 6> <C1 6> "\cG" 7 <BELL> <DELETE> <DELETE> "\cH" 8 <BACKSPACE> <C1 23> <C1 23> "\cI" 9 <HORIZ. TAB.> <C1 13> <C1 13> "\cJ" 10 <LINE FEED> <C1 14> <C1 14> "\cK" 11 <VERT. TAB.> <VERT. TAB.> <VERT. TAB.> "\cL" 12 <FORM FEED> <FORM FEED> <FORM FEED> "\cM" 13 <CARRIAGE RETURN> <CARRIAGE RETURN> <CARRIAGE RETURN> "\cN" 14 <SHIFT OUT> <SHIFT OUT> <SHIFT OUT> "\cO" 15 <SHIFT IN> <SHIFT IN> <SHIFT IN> "\cP" 16 <DATA LINK ESCAPE> <DATA LINK ESCAPE> <DATA LINK ESCAPE> "\cQ" 17 <D.C. ONE> <D.C. ONE> <D.C. ONE> "\cR" 18 <D.C. TWO> <D.C. TWO> <D.C. TWO> "\cS" 19 <D.C. THREE> <D.C. THREE> <D.C. THREE> "\cT" 20 <D.C. FOUR> <C1 29> <C1 29> "\cU" 21 <NEG. ACK.> <C1 5> <LINE FEED> *** "\cV" 22 <SYNCHRONOUS IDLE> <BACKSPACE> <BACKSPACE> "\cW" 23 <E.O. TRANS. BLOCK> <C1 7> <C1 7> "\cX" 24 <CANCEL> <CANCEL> <CANCEL> "\cY" 25 <E.O. MEDIUM> <E.O. MEDIUM> <E.O. MEDIUM> "\cZ" 26 <SUBSTITUTE> <C1 18> <C1 18> "\c[" 27 <ESCAPE> <C1 15> <C1 15> "\c\\" 28 <FILE SEP.>\ <FILE SEP.>\ <FILE SEP.>\ "\c]" 29 <GROUP SEP.> <GROUP SEP.> <GROUP SEP.> "\c^" 30 <RECORD SEP.> <RECORD SEP.> <RECORD SEP.> ***>< "\c_" 31 <UNIT SEP.> <UNIT SEP.> <UNIT SEP.> ***><=head1 FUNCTION DIFFERENCES=over 8=item chr()chr() must be given an EBCDIC code number argument to yield a desired character return value on an EBCDIC machine. For example: $CAPITAL_LETTER_A = chr(193);=item ord()ord() will return EBCDIC code number values on an EBCDIC machine.For example: $the_number_193 = ord("A");=item pack()The c and C templates for pack() are dependent upon character set encoding. Examples of usage on EBCDIC include: $foo = pack("CCCC",193,194,195,196); # $foo eq "ABCD" $foo = pack("C4",193,194,195,196); # same thing $foo = pack("ccxxcc",193,194,195,196); # $foo eq "AB\0\0CD"=item print()One must be careful with scalars and strings that are passed toprint that contain ASCII encodings. One common placefor this to occur is in the output of the MIME type header forCGI script writing. For example, many perl programming guides recommend something similar to: print "Content-type:\ttext/html\015\012\015\012"; # this may be wrong on EBCDICUnder the IBM OS/390 USS Web Server or WebSphere on z/OS for example you should instead write that as: print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et aliaThat is because the translation from EBCDIC to ASCII is doneby the web server in this case (such code will not be appropriate forthe Macintosh however). Consult your web server's documentation for further details.=item printf()The formats that can convert characters to numbers and vice versawill be different from their ASCII counterparts when executedon an EBCDIC machine. Examples include: printf("%c%c%c",193,194,195); # prints ABC=item sort()EBCDIC sort results may differ from ASCII sort results especially for mixed case strings. This is discussed in more detail below.=item sprintf()See the discussion of printf() above. An example of the useof sprintf would be: $CAPITAL_LETTER_A = sprintf("%c",193);=item unpack()See the discussion of pack() above.=back=head1 REGULAR EXPRESSION DIFFERENCESAs of perl 5.005_03 the letter range regular expression such as [A-Z] and [a-z] have been especially coded to not pick up gap characters. For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX> that lie between I and J would not be matched by the regular expression range C</[H-K]/>. This works inthe other direction, too, if either of the range end points isexplicitly numeric: C<[\x89-\x91]> will match C<\x8e>, eventhough C<\x89> is C<i> and C<\x91 > is C<j>, and C<\x8e>is a gap character from the alphabetic viewpoint.If you do want to match the alphabet gap characters in a single octet regular expression try matching the hex or octal code such as C</\313/> on EBCDIC or C</\364/> on ASCII machines to have your regular expression match C<o WITH CIRCUMFLEX>.Another construct to be wary of is the inappropriate use of hex oroctal constants in regular expressions. Consider the followingset of subs: sub is_c0 { my $char = substr(shift,0,1); $char =~ /[\000-\037]/; } sub is_print_ascii { my $char = substr(shift,0,1); $char =~ /[\040-\176]/; } sub is_delete { my $char = substr(shift,0,1); $char eq "\177"; } sub is_c1 { my $char = substr(shift,0,1); $char =~ /[\200-\237]/; } sub is_latin_1 { my $char = substr(shift,0,1); $char =~ /[\240-\377]/; }The above would be adequate if the concern was only with numeric code points.However, the concern may be with characters rather than code points and on an EBCDIC machine it may be desirable for constructs such as C<if (is_print_ascii("A")) {print "A is a printable character\n";}> to printout the expected message. One way to represent the above collectionof character classification subs that is capable of working across thefour coded character sets discussed in this document is as follows: sub Is_c0 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\000-\037]/; } if (ord('^')==176) { # 37 return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/; } if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/; } } sub Is_print_ascii { my $char = substr(shift,0,1); $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/; } sub Is_delete { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char eq "\177"; } else { # ebcdic return $char eq "\007"; } } sub Is_c1 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\200-\237]/; } if (ord('^')==176) { # 37 return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; } if (ord('^')==95) { # 1047 return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; } if (ord('^')==106) { # posix-bc return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/; } } sub Is_latin_1 { my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\240-\377]/; } if (ord('^')==176) { # 37 return $char =~ /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==95) { # 1047 return $char =~ /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==106) { # posix-bc return $char =~ /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/; } }Note however that only the C<Is_ascii_print()> sub is really independent of coded character set. Another way to write C<Is_latin_1()> would be to use the characters in the range explicitly: sub Is_latin_1 { my $char = substr(shift,0,1); $char =~ /[牎ⅲぅΗī
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -