📄 perlebcdic.1
字号:
all the possible code points, pack(\*(L"U\*(R",ord(\*(L"A\*(R")) would in \s-1EBCDIC\s0equal \fIA with acute\fR or chr(101), and unpack(\*(L"U\*(R", \*(L"A\*(R") would equal65, or \fInon-breaking space\fR, not 193, or ord \*(L"A\*(R".).Sh "Remaining Perl Unicode problems in \s-1EBCDIC\s0".IX Subsection "Remaining Perl Unicode problems in EBCDIC".IP "\(bu" 4Many of the remaining seem to be related to case-insensitive matching:for example, \f(CW\*(C`/[\ex{131}]/\*(C'\fR (\s-1LATIN\s0 \s-1SMALL\s0 \s-1LETTER\s0 \s-1DOTLESS\s0 I) doesnot match \*(L"I\*(R" case-insensitively, as it should under Unicode.(The match succeeds in ASCII-derived platforms.).IP "\(bu" 4The extensions Unicode::Collate and Unicode::Normalized are notsupported under \s-1EBCDIC\s0, likewise for the encoding pragma..Sh "Unicode and \s-1UTF\s0".IX Subsection "Unicode and UTF"\&\s-1UTF\s0 is a Unicode Transformation Format. \s-1UTF\-8\s0 is a Unicode conformingrepresentation of the Unicode standard that looks very much like \s-1ASCII\s0.UTF-EBCDIC is an attempt to represent Unicode characters in an \s-1EBCDIC\s0transparent manner..Sh "Using Encode".IX Subsection "Using Encode"Starting from Perl 5.8 you can use the standard new module Encodeto translate from \s-1EBCDIC\s0 to Latin\-1 code points.PP.Vb 1\& use Encode \*(Aqfrom_to\*(Aq;\&\& my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );\&\& # $a is in EBCDIC code points\& from_to($a, $ebcdic{ord \*(Aq^\*(Aq}, \*(Aqlatin1\*(Aq);\& # $a is ISO 8859\-1 code points.Ve.PPand from Latin\-1 code points to \s-1EBCDIC\s0 code points.PP.Vb 1\& use Encode \*(Aqfrom_to\*(Aq;\&\& my %ebcdic = ( 176 => \*(Aqcp37\*(Aq, 95 => \*(Aqcp1047\*(Aq, 106 => \*(Aqposix\-bc\*(Aq );\&\& # $a is ISO 8859\-1 code points\& from_to($a, \*(Aqlatin1\*(Aq, $ebcdic{ord \*(Aq^\*(Aq});\& # $a is in EBCDIC code points.Ve.PPFor doing I/O it is suggested that you use the autotranslating featuresof PerlIO, see perluniintro..PPSince version 5.8 Perl uses the new PerlIO I/O library. This enablesyou to use different encodings per \s-1IO\s0 channel. For example you may use.PP.Vb 9\& use Encode;\& open($f, ">:encoding(ascii)", "test.ascii");\& print $f "Hello World!\en";\& open($f, ">:encoding(cp37)", "test.ebcdic");\& print $f "Hello World!\en";\& open($f, ">:encoding(latin1)", "test.latin1");\& print $f "Hello World!\en";\& open($f, ">:encoding(utf8)", "test.utf8");\& print $f "Hello World!\en";.Ve.PPto get two files containing \*(L"Hello World!\en\*(R" in \s-1ASCII\s0, \s-1CP\s0 37 \s-1EBCDIC\s0,\&\s-1ISO\s0 8859\-1 (Latin\-1) (in this example identical to \s-1ASCII\s0) respectiveUTF-EBCDIC (in this example identical to normal \s-1EBCDIC\s0). See thedocumentation of Encode::PerlIO for details..PPAs the PerlIO layer uses raw \s-1IO\s0 (bytes) internally, all this totallyignores things like the type of your filesystem (\s-1ASCII\s0 or \s-1EBCDIC\s0)..SH "SINGLE OCTET TABLES".IX Header "SINGLE OCTET TABLES"The following tables list the \s-1ASCII\s0 and Latin 1 ordered sets includingthe subsets: C0 controls (0..31), \s-1ASCII\s0 graphics (32..7e), delete (7f),C1 controls (80..9f), and Latin\-1 (a.k.a. \s-1ISO\s0 8859\-1) (a0..ff). In the table non-printing control character names as well as the Latin 1 extensions to \s-1ASCII\s0 have been labelled with character names roughly corresponding to \fIThe Unicode Standard, Version 3.0\fR albeit with substitutions such as s/LATIN// and s/VULGAR// in all cases, s/CAPITAL \s-1LETTER//\s0 in some cases, and s/SMALL \s-1LETTER\s0 ([A\-Z])/\el$1/ in some other cases (the \f(CW\*(C`charnames\*(C'\fR pragma names unfortunately do not list explicit names for the C0 or C1 control characters). The \&\*(L"names\*(R" of the C1 control set (128..159 in \s-1ISO\s0 8859\-1) listed here are somewhat arbitrary. The differences between the 0037 and 1047 sets are flagged with ***. The differences between the 1047 and POSIX-BC sets are flagged with ###. All \fIord()\fR numbers listed are decimal. If you would rather see this table listing octal values then run the table (that is, the pod version of this document since this recipe may not work with a pod2_other_format translation) through:.IP "recipe 0" 4.IX Item "recipe 0".PP.Vb 2\& perl \-ne \*(Aqif(/(.{33})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e\& \-e \*(Aq{printf("%s%\-9o%\-9o%\-9o%o\en",$1,$2,$3,$4,$5)}\*(Aq perlebcdic.pod.Ve.PPIf you want to retain the UTF-x code points then in script form youmight want to write:.IP "recipe 1" 4.IX Item "recipe 1".PP.Vb 10\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";\& while (<FH>) {\& if (/(.{33})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)\es+(\ed+)\e.?(\ed*)/) {\& if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {\& printf("%s%\-9o%\-9o%\-9o%\-9o%\-3o.%\-5o%\-3o.%o\en",$1,$2,$3,$4,$5,$6,$7,$8,$9);\& }\& elsif ($7 ne \*(Aq\*(Aq) {\& printf("%s%\-9o%\-9o%\-9o%\-9o%\-3o.%\-5o%o\en",$1,$2,$3,$4,$5,$6,$7,$8);\& }\& else {\& printf("%s%\-9o%\-9o%\-9o%\-9o%\-9o%o\en",$1,$2,$3,$4,$5,$6,$8);\& }\& }\& }.Ve.PPIf you would rather see this table listing hexadecimal values thenrun the table through:.IP "recipe 2" 4.IX Item "recipe 2".PP.Vb 2\& perl \-ne \*(Aqif(/(.{33})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)/)\*(Aq \e\& \-e \*(Aq{printf("%s%\-9X%\-9X%\-9X%X\en",$1,$2,$3,$4,$5)}\*(Aq perlebcdic.pod.Ve.PPOr, in order to retain the UTF-x code points in hexadecimal:.IP "recipe 3" 4.IX Item "recipe 3".PP.Vb 10\& open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";\& while (<FH>) {\& if (/(.{33})(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\es+(\ed+)\e.?(\ed*)\es+(\ed+)\e.?(\ed*)/) {\& if ($7 ne \*(Aq\*(Aq && $9 ne \*(Aq\*(Aq) {\& printf("%s%\-9X%\-9X%\-9X%\-9X%\-2X.%\-6X%\-2X.%X\en",$1,$2,$3,$4,$5,$6,$7,$8,$9);\& }\& elsif ($7 ne \*(Aq\*(Aq) {\& printf("%s%\-9X%\-9X%\-9X%\-9X%\-2X.%\-6X%X\en",$1,$2,$3,$4,$5,$6,$7,$8);\& }\& else {\& printf("%s%\-9X%\-9X%\-9X%\-9X%\-9X%X\en",$1,$2,$3,$4,$5,$6,$8);\& }\& }\& }\&\&\& incomp\- incomp\-\& 8859\-1 lete lete\& chr 0819 0037 1047 POSIX\-BC UTF\-8 UTF\-EBCDIC\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& <NULL> 0 0 0 0 0 0 \& <START OF HEADING> 1 1 1 1 1 1\& <START OF TEXT> 2 2 2 2 2 2\& <END OF TEXT> 3 3 3 3 3 3\& <END OF TRANSMISSION> 4 55 55 55 4 55 \& <ENQUIRY> 5 45 45 45 5 45 \& <ACKNOWLEDGE> 6 46 46 46 6 46 \& <BELL> 7 47 47 47 7 47 \& <BACKSPACE> 8 22 22 22 8 22 \& <HORIZONTAL TABULATION> 9 5 5 5 9 5 \& <LINE FEED> 10 37 21 21 10 21 ***\& <VERTICAL TABULATION> 11 11 11 11 11 11\& <FORM FEED> 12 12 12 12 12 12\& <CARRIAGE RETURN> 13 13 13 13 13 13\& <SHIFT OUT> 14 14 14 14 14 14\& <SHIFT IN> 15 15 15 15 15 15\& <DATA LINK ESCAPE> 16 16 16 16 16 16\& <DEVICE CONTROL ONE> 17 17 17 17 17 17\& <DEVICE CONTROL TWO> 18 18 18 18 18 18\& <DEVICE CONTROL THREE> 19 19 19 19 19 19\& <DEVICE CONTROL FOUR> 20 60 60 60 20 60\& <NEGATIVE ACKNOWLEDGE> 21 61 61 61 21 61\& <SYNCHRONOUS IDLE> 22 50 50 50 22 50\& <END OF TRANSMISSION BLOCK> 23 38 38 38 23 38\& <CANCEL> 24 24 24 24 24 24\& <END OF MEDIUM> 25 25 25 25 25 25\& <SUBSTITUTE> 26 63 63 63 26 63\& <ESCAPE> 27 39 39 39 27 39\& <FILE SEPARATOR> 28 28 28 28 28 28\& <GROUP SEPARATOR> 29 29 29 29 29 29\& <RECORD SEPARATOR> 30 30 30 30 30 30\& <UNIT SEPARATOR> 31 31 31 31 31 31\& <SPACE> 32 64 64 64 32 64\& ! 33 90 90 90 33 90\& " 34 127 127 127 34 127\& # 35 123 123 123 35 123\& $ 36 91 91 91 36 91\& % 37 108 108 108 37 108\& & 38 80 80 80 38 80\& \*(Aq 39 125 125 125 39 125\& ( 40 77 77 77 40 77\& ) 41 93 93 93 41 93\& * 42 92 92 92 42 92\& + 43 78 78 78 43 78\& , 44 107 107 107 44 107\& \- 45 96 96 96 45 96\& . 46 75 75 75 46 75\& / 47 97 97 97 47 97\& 0 48 240 240 240 48 240\& 1 49 241 241 241 49 241\& 2 50 242 242 242 50 242\& 3 51 243 243 243 51 243\& 4 52 244 244 244 52 244\& 5 53 245 245 245 53 245\& 6 54 246 246 246 54 246\& 7 55 247 247 247 55 247\& 8 56 248 248 248 56 248\& 9 57 249 249 249 57 249\& : 58 122 122 122 58 122\& ; 59 94 94 94 59 94\& < 60 76 76 76 60 76\& = 61 126 126 126 61 126\& > 62 110 110 110 62 110\& ? 63 111 111 111 63 111\& @ 64 124 124 124 64 124\& A 65 193 193 193 65 193\& B 66 194 194 194 66 194\& C 67 195 195 195 67 195\& D 68 196 196 196 68 196\& E 69 197 197 197 69 197\& F 70 198 198 198 70 198\& G 71 199 199 199 71 199\& H 72 200 200 200 72 200\& I 73 201 201 201 73 201\& J 74 209 209 209 74 209\& K 75 210 210 210 75 210\& L 76 211 211 211 76 211\& M 77 212 212 212 77 212\& N 78 213 213 213 78 213\& O 79 214 214 214 79 214\& P 80 215 215 215 80 215\& Q 81 216 216 216 81 216\& R 82 217 217 217 82 217\& S 83 226 226 226 83 226\& T 84 227 227 227 84 227\& U 85 228 228 228 85 228\& V 86 229 229 229 86 229\& W 87 230 230 230 87 230\& X 88 231 231 231 88 231\& Y 89 232 232 232 89 232\& Z 90 233 233 233 90 233\& [ 91 186 173 187 91 173 *** ###
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -