📄 perlebcdic.1

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 1
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
.IX Header "REGULAR EXPRESSION DIFFERENCES"As of perl 5.005_03 the letter range regular expression such as [A\-Z] and [a\-z] have been especially coded to not pick up gap characters.  For example, characters such as o\*^ \f(CW\*(C`o WITH CIRCUMFLEX\*(C'\fR that lie between I and J would not be matched by the regular expression range \f(CW\*(C`/[H\-K]/\*(C'\fR.  This works inthe other direction, too, if either of the range end points isexplicitly numeric: \f(CW\*(C`[\ex89\-\ex91]\*(C'\fR will match \f(CW\*(C`\ex8e\*(C'\fR, eventhough \f(CW\*(C`\ex89\*(C'\fR is \f(CW\*(C`i\*(C'\fR and \f(CW\*(C`\ex91 \*(C'\fR is \f(CW\*(C`j\*(C'\fR, and \f(CW\*(C`\ex8e\*(C'\fRis a gap character from the alphabetic viewpoint..PPIf you do want to match the alphabet gap characters in a single octet regular expression try matching the hex or octal code such as \f(CW\*(C`/\e313/\*(C'\fR on \s-1EBCDIC\s0 or \f(CW\*(C`/\e364/\*(C'\fR on \s-1ASCII\s0 machines to have your regular expression match \f(CW\*(C`o WITH CIRCUMFLEX\*(C'\fR..PPAnother construct to be wary of is the inappropriate use of hex oroctal constants in regular expressions.  Consider the followingset of subs:.PP.Vb 4\&    sub is_c0 {\&        my $char = substr(shift,0,1);\&        $char =~ /[\e000\-\e037]/;\&    }\&\&    sub is_print_ascii {\&        my $char = substr(shift,0,1);\&        $char =~ /[\e040\-\e176]/;\&    }\&\&    sub is_delete {\&        my $char = substr(shift,0,1);\&        $char eq "\e177";\&    }\&\&    sub is_c1 {\&        my $char = substr(shift,0,1);\&        $char =~ /[\e200\-\e237]/;\&    }\&\&    sub is_latin_1 {\&        my $char = substr(shift,0,1);\&        $char =~ /[\e240\-\e377]/;\&    }.Ve.PPThe above would be adequate if the concern was only with numeric code points.However, the concern may be with characters rather than code points and on an \s-1EBCDIC\s0 machine it may be desirable for constructs such as \&\f(CW\*(C`if (is_print_ascii("A")) {print "A is a printable character\en";}\*(C'\fR to printout the expected message.  One way to represent the above collectionof character classification subs that is capable of working across thefour coded character sets discussed in this document is as follows:.PP.Vb 12\&    sub Is_c0 {\&        my $char = substr(shift,0,1);\&        if (ord(\*(Aq^\*(Aq)==94)  { # ascii\&            return $char =~ /[\e000\-\e037]/;\&        } \&        if (ord(\*(Aq^\*(Aq)==176) { # 37\&            return $char =~ /[\e000\-\e003\e067\e055\-\e057\e026\e005\e045\e013\-\e023\e074\e075\e062\e046\e030\e031\e077\e047\e034\-\e037]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==95 || ord(\*(Aq^\*(Aq)==106) { # 1047 || posix\-bc\&            return $char =~ /[\e000\-\e003\e067\e055\-\e057\e026\e005\e025\e013\-\e023\e074\e075\e062\e046\e030\e031\e077\e047\e034\-\e037]/;\&        }\&    }\&\&    sub Is_print_ascii {\&        my $char = substr(shift,0,1);\&        $char =~ /[ !"\e#\e$%&\*(Aq()*+,\e\-.\e/0\-9:;<=>?\e@A\-Z[\e\e\e]^_\`a\-z{|}~]/;\&    }\&\&    sub Is_delete {\&        my $char = substr(shift,0,1);\&        if (ord(\*(Aq^\*(Aq)==94)  { # ascii\&            return $char eq "\e177";\&        }\&        else  {              # ebcdic\&            return $char eq "\e007";\&        }\&    }\&\&    sub Is_c1 {\&        my $char = substr(shift,0,1);\&        if (ord(\*(Aq^\*(Aq)==94)  { # ascii\&            return $char =~ /[\e200\-\e237]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==176) { # 37\&            return $char =~ /[\e040\-\e044\e025\e006\e027\e050\-\e054\e011\e012\e033\e060\e061\e032\e063\-\e066\e010\e070\-\e073\e040\e024\e076\e377]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==95)  { # 1047\&            return $char =~ /[\e040\-\e045\e006\e027\e050\-\e054\e011\e012\e033\e060\e061\e032\e063\-\e066\e010\e070\-\e073\e040\e024\e076\e377]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==106) { # posix\-bc\&            return $char =~ \&              /[\e040\-\e045\e006\e027\e050\-\e054\e011\e012\e033\e060\e061\e032\e063\-\e066\e010\e070\-\e073\e040\e024\e076\e137]/;\&        }\&    }\&\&    sub Is_latin_1 {\&        my $char = substr(shift,0,1);\&        if (ord(\*(Aq^\*(Aq)==94)  { # ascii\&            return $char =~ /[\e240\-\e377]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==176) { # 37\&            return $char =~ \&              /[\e101\e252\e112\e261\e237\e262\e152\e265\e275\e264\e232\e212\e137\e312\e257\e274\e220\e217\e352\e372\e276\e240\e266\e263\e235\e332\e233\e213\e267\e270\e271\e253\e144\e145\e142\e146\e143\e147\e236\e150\e164\e161\-\e163\e170\e165\-\e167\e254\e151\e355\e356\e353\e357\e354\e277\e200\e375\e376\e373\e374\e255\e256\e131\e104\e105\e102\e106\e103\e107\e234\e110\e124\e121\-\e123\e130\e125\-\e127\e214\e111\e315\e316\e313\e317\e314\e341\e160\e335\e336\e333\e334\e215\e216\e337]/;\&        }\&        if (ord(\*(Aq^\*(Aq)==95)  { # 1047\&            return $char =~\&              /[\e101\e252\e112\e261\e237\e262\e152\e265\e273\e264\e232\e212\e260\e312\e257\e274\e220\e217\e352\e372\e276\e240\e266\e263\e235\e332\e233\e213\e267\e270\e271\e253\e144\e145\e142\e146\e143\e147\e236\e150\e164\e161\-\e163\e170\e165\-\e167\e254\e151\e355\e356\e353\e357\e354\e277\e200\e375\e376\e373\e374\e272\e256\e131\e104\e105\e102\e106\e103\e107\e234\e110\e124\e121\-\e123\e130\e125\-\e127\e214\e111\e315\e316\e313\e317\e314\e341\e160\e335\e336\e333\e334\e215\e216\e337]/; \&        }\&        if (ord(\*(Aq^\*(Aq)==106) { # posix\-bc\&            return $char =~ \&              /[\e101\e252\e260\e261\e237\e262\e320\e265\e171\e264\e232\e212\e272\e312\e257\e241\e220\e217\e352\e372\e276\e240\e266\e263\e235\e332\e233\e213\e267\e270\e271\e253\e144\e145\e142\e146\e143\e147\e236\e150\e164\e161\-\e163\e170\e165\-\e167\e254\e151\e355\e356\e353\e357\e354\e277\e200\e340\e376\e335\e374\e255\e256\e131\e104\e105\e102\e106\e103\e107\e234\e110\e124\e121\-\e123\e130\e125\-\e127\e214\e111\e315\e316\e313\e317\e314\e341\e160\e300\e336\e333\e334\e215\e216\e337]/;\&        }\&    }.Ve.PPNote however that only the \f(CW\*(C`Is_ascii_print()\*(C'\fR sub is really independent of coded character set.  Another way to write \f(CW\*(C`Is_latin_1()\*(C'\fR would be to use the characters in the range explicitly:.PP.Vb 4\&    sub Is_latin_1 {\&        my $char = substr(shift,0,1);\&        $char =~ /[\ XXXXXXXXXXXX\%XXXXXXXXXXXXXXXXXXA\*`A\*'A\*^A\*~A\*:A\*o\*(AEC\*,E\*`E\*'E\*^E\*:I\*`I\*'I\*^I\*:\*(D\-N\*~O\*`O\*'O\*^O\*~O\*:XO\*/U\*`U\*'U\*^U\*:Y\*'\*(Th\*8a\*`a\*'a\*^a\*~a\*:a\*o\*(aec\*,e\*`e\*'e\*^e\*:i\*`i\*'i\*^i\*:\*(d\-n\*~o\*`o\*'o\*^o\*~o\*:Xo\*/u\*`u\*'u\*^u\*:y\*'\*(thy\*:]/;\&    }.Ve.PPAlthough that form may run into trouble in network transit (due to the presence of 8 bit characters) or on non ISO-Latin character sets..SH "SOCKETS".IX Header "SOCKETS"Most socket programming assumes \s-1ASCII\s0 character encodings in networkbyte order.  Exceptions can include \s-1CGI\s0 script writing under ahost web server where the server may take care of translation for you.Most host web servers convert \s-1EBCDIC\s0 data to \s-1ISO\-8859\-1\s0 or Unicode onoutput..SH "SORTING".IX Header "SORTING"One big difference between \s-1ASCII\s0 based character sets and \s-1EBCDIC\s0 onesare the relative positions of upper and lower case letters and theletters compared to the digits.  If sorted on an \s-1ASCII\s0 based machine thetwo letter abbreviation for a physician comes before the two letterfor drive, that is:.PP.Vb 2\&    @sorted = sort(qw(Dr. dr.));  # @sorted holds (\*(AqDr.\*(Aq,\*(Aqdr.\*(Aq) on ASCII,\&                                  # but (\*(Aqdr.\*(Aq,\*(AqDr.\*(Aq) on EBCDIC.Ve.PPThe property of lower case before uppercase letters in \s-1EBCDIC\s0 iseven carried to the Latin 1 \s-1EBCDIC\s0 pages such as 0037 and 1047.An example would be that E\*: \f(CW\*(C`E WITH DIAERESIS\*(C'\fR (203) comes before e\*: \f(CW\*(C`e WITH DIAERESIS\*(C'\fR (235) on an \s-1ASCII\s0 machine, but the latter (83) comes before the former (115) on an \s-1EBCDIC\s0 machine.  (Astute readers will note that the upper case version of \*8 \&\f(CW\*(C`SMALL LETTER SHARP S\*(C'\fR is simply \*(L"\s-1SS\s0\*(R" and that the upper case version of y\*: \f(CW\*(C`y WITH DIAERESIS\*(C'\fR is not in the 0..255 range but it is at U+x0178 in Unicode, or \f(CW"\ex{178}"\fR in a Unicode enabled Perl)..PPThe sort order will cause differences between results obtained on\&\s-1ASCII\s0 machines versus \s-1EBCDIC\s0 machines.  What follows are some suggestionson how to deal with these differences..Sh "Ignore \s-1ASCII\s0 vs. \s-1EBCDIC\s0 sort differences.".IX Subsection "Ignore ASCII vs. EBCDIC sort differences."This is the least computationally expensive strategy.  It may requiresome user education..Sh "\s-1MONO\s0 \s-1CASE\s0 then sort data.".IX Subsection "MONO CASE then sort data."In order to minimize the expense of mono casing mixed test try to\&\f(CW\*(C`tr///\*(C'\fR towards the character set case most employed within the data.If the data are primarily \s-1UPPERCASE\s0 non Latin 1 then apply tr/[a\-z]/[A\-Z]/then \fIsort()\fR.  If the data are primarily lowercase non Latin 1 thenapply tr/[A\-Z]/[a\-z]/ before sorting.  If the data are primarily \s-1UPPERCASE\s0and include Latin\-1 characters then apply:.PP.Vb 3\&    tr/[a\-z]/[A\-Z]/; \&    tr/[a\*`a\*'a\*^a\*~a\*:a\*o\*(aec\*,e\*`e\*'e\*^e\*:i\*`i\*'i\*^i\*:\*(d\-n\*~o\*`o\*'o\*^o\*~o\*:o\*/u\*`u\*'u\*^u\*:y\*'\*(th]/[A\*`A\*'A\*^A\*~A\*:A\*o\*(AEC\*,E\*`E\*'E\*^E\*:I\*`I\*'I\*^I\*:\*(D\-N\*~O\*`O\*'O\*^O\*~O\*:O\*/U\*`U\*'U\*^U\*:Y\*'\*(Th]/;\&    s/\*8/SS/g;.Ve.PPthen \fIsort()\fR.  Do note however that such Latin\-1 manipulation does not address the y\*: \f(CW\*(C`y WITH DIAERESIS\*(C'\fR character that will remain at code point 255 on \s-1ASCII\s0 machines, but 223 on most \s-1EBCDIC\s0 machines where it will sort to a place less than the \s-1EBCDIC\s0 numerals.  With a Unicode enabled Perl you might try:.PP.Vb 1\&    tr/^?/\ex{178}/;.Ve.PPThe strategy of mono casing data before sorting does not preserve the case of the data and may not be acceptable for that reason..Sh "Convert, sort data, then re convert.".IX Subsection "Convert, sort data, then re convert."This is the most expensive proposition that does not employ a networkconnection..Sh "Perform sorting on one type of machine only.".IX Subsection "Perform sorting on one type of machine only."This strategy can employ a network connection.  As suchit would be computationally expensive..SH "TRANSFORMATION FORMATS".IX Header "TRANSFORMATION FORMATS"There are a variety of ways of transforming data with an intra character set mapping that serve a variety of purposes.  Sorting was discussed in the previous section and a few of the other more popular mapping techniques are discussed next..Sh "\s-1URL\s0 decoding and encoding".IX Subsection "URL decoding and encoding"Note that some URLs have hexadecimal \s-1ASCII\s0 code points in them in anattempt to overcome character or protocol limitation issues.  For example the tilde character is not on every keyboard hence a \s-1URL\s0 of the form:.PP.Vb 1\&    http://www.pvhp.com/~pvhp/.Ve.PPmay also be expressed as either of:.PP.Vb 1\&    http://www.pvhp.com/%7Epvhp/\&\&    http://www.pvhp.com/%7epvhp/.Ve.PPwhere 7E is the hexadecimal \s-1ASCII\s0 code point for '~'.  Here is an exampleof decoding such a \s-1URL\s0 under \s-1CCSID\s0 1047:.PP.Vb 10\&    $url = \*(Aqhttp://www.pvhp.com/%7Epvhp/\*(Aq;\&    # this array assumes code page 1047\&    my @a2e_1047 = (\&          0,  1,  2,  3, 55, 45, 46, 47, 22,  5, 21, 11, 12, 13, 14, 15,\&         16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,\&         64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,\&        240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,\&        124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,\&        215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,\&        121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,\&        151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161,  7,\&         32, 33, 34, 35, 36, 37,  6, 23, 40, 41, 42, 43, 44,  9, 10, 27,\&         48, 49, 2
上一页 1 2 3 45
💿 文件大小 62625 K
👤 上传用户 qqwoshi
📂 所属分类嵌入式Linux
🏷️ 相关标签

#ddns #视频监控 #网络 #分
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -