📄 繁简体(gb=big5)字符串转化的java方式实现 zeal blog - 泽欧里的网络日志.htm
字号:
<DIV>
<SCRIPT language=Javascript
src="繁简体(GB=Big5)字符串转化的JAVA方式实现 Zeal Blog - 泽欧里的网络日志.files/netcollect.js"
type=text/javascript></SCRIPT>
</DIV>
<DIV id=bannerForLeftTop></DIV><STRONG>引用本文(TrackBack): </STRONG>此功能无限期关闭ing ...
如有引用请在文中标明出处并提供超链接 <A id=track></A><STRONG> </STRONG><BR>
<DIV class=comment>
<DIV style="CLEAR: both"><B>47 条评论:</B></DIV><A id=comm></A>
<DIV>- <B>xiaomin</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'yuxiaomin_whu';
var domain = '163.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email xiaomin">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2005-04-29 19:14</DIV>
<DIV class=commCont>我在Windows下面结果很好。 可是在Linux下面转换的结果不对, 是不是要重新生成两个table文件?</DIV>
<DIV>- <B>zeal</B> 于 2005-04-29 19:26</DIV>
<DIV class=commCont>你需要给linux设置环境变量<BR>LANG=zh_CN.GBK<BR>或者在linux带参数重新编译。</DIV>
<DIV>- <B>chenxg</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'chenwpp';
var domain = '21cn.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email chenxg">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2005-07-14 16:20</DIV>
<DIV class=commCont>请教:gb->big5转换出来的并非big5码,而是gbk繁体,请问原因?</DIV>
<DIV>- <B>zeal</B> (<A title=www.zeali.net/ href="http://www.zeali.net/"
target=_blank>link</A>) 于 2005-07-14 22:56</DIV>
<DIV class=commCont>不会吧?你具体是怎么操作的?</DIV>
<DIV>- <B>aman</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'chi_kingman';
var domain = 'yahoo.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email aman">');
document.write('email<\/a>');
// -->
</SCRIPT>
) (<A title=aman.38.com href="http://aman.38.com/" target=_blank>link</A>) 于
2005-08-04 11:51</DIV>
<DIV class=commCont>,能不能套用在php</DIV>
<DIV>- <B>zeal</B> (<A title=www.zeali.net/ href="http://www.zeali.net/"
target=_blank>link</A>) 于 2005-08-04 11:53</DIV>
<DIV class=commCont>php版本的繁简体转化请参阅这片帖子:<BR><A
href="http://www.zeali.net/blog/entry.php?id=55"
target=_blank>http://www.zeali.net/blog/entry.php?id=55</A></DIV>
<DIV>- <B>pengwenming</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'pengwenming11';
var domain = 'sina.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email pengwenming">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-03-12 13:51</DIV>
<DIV
class=commCont>将简体转为繁体没什么问题,但将繁体转为简体就会出现乱码-我的系统是简体。<BR>解决方法是将gb2big5.java中的big52gb(String
inStr)函数中的<BR>byte[] Text = new
String(inStr.getBytes(),"BIG5").getBytes("BIG5");代码行改为<BR>byte[] Text =
inStr.getBytes("BIG5");<BR><BR>感谢zeal的无私奉献,如果有时间的话,可否麻烦你告诉我那两个码表从何处可以获得<BR>预先表示感谢!!!另外,很好奇你是否身居港台?多嘴了</DIV>
<DIV>- <B>zeal</B> (<A title=www.zeali.net/ href="http://www.zeali.net/"
target=_blank>link</A>) 于 2006-03-12 16:17</DIV>
<DIV class=commCont>码表在本文最后提供的两个下载压缩包里面都有。你解开即可。<BR>p.s. 我是上海嘀。</DIV>
<DIV>- <B>pengwenming</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'pengwenming11';
var domain = 'sina.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email pengwenming">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-03-13 19:37</DIV>
<DIV
class=commCont>没想到你回复这么快,谢谢!可能是我没说清楚,我的意思是你是从何处获得那两个码表的。<BR>另外,你的StreamConverter类是为“从网络连接获得输入流然后转为字节数组”而写的吗?<BR>因为我发现你特地对流堵塞情况进行了处理,不知道为何如此。<BR>不过有一点疑问,input.available()
!= 0即使成立,<BR>status =
input.read(buffer)执行时在具备以下条件后也可能出现堵塞情况:<BR>1,没有达到文件末尾<BR>2,没足够的有效数据填满buffer<BR>我是这样解决的,希望进一步交流<BR>byte[]
buffer=new byte[in.available()];<BR>ByteArrayOutputStream bos=new
ByteArrayOutputStream();<BR>int
byteNum=in.read(buffer);<BR>while(byteNum!=-1){<BR>bos.write(buffer,0,byteNum);<BR>byteNum=in.read(buffer);<BR>}<BR>in.close();<BR>byte[]
sContent = bos.toByteArray();<BR>bos.close();<BR>最后,我也在上海,地球很小啊!</DIV>
<DIV>- <B>zeal</B> (<A title=www.zeali.net/ href="http://www.zeali.net/"
target=_blank>link</A>) 于 2006-03-16 23:28</DIV>
<DIV
class=commCont>原来如此。这两个码表的原始出处已经无从考证了,是从Google上搜来的。<BR>这个StreamConverter你猜得没错,最初确实是设计来读取URL的内容而写,属于我为公司开发架构编写的common
library里面的一个util类。对于GB2Big5来说,其实只是用它来完成把输入字符流转化成字节数组。第一版的GB2Big5采用的是java.io.RandomAccessFile来完成文件读取,后来觉得要打jar包,把码表文件作为资源一起打进去,用io来读取并不通用,所以改成使用这个StreamConverter来转化通过getResourceAsStream得到的输入字符流。<BR>由于历史原因,对于流堵塞进行的特殊处理是为了使用于多线程环境的网络数据读取。对于GB2Big5来说,你所提供的写法完全可以。</DIV>
<DIV>- <B>pengwenming1</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'pengwenming11';
var domain = 'sina.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email pengwenming1">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-03-18 21:02</DIV>
<DIV
class=commCont>谢谢你的回复!这两天利用空闲时间认真地了解了一下字符集编码的问题,把原来那些纷乱的思路理了一下,<BR>你的源代码让我明白了很多东西,谢谢你!<BR>关于繁体转简体的问题,你的代码用于big5,我提供的代码行用于以GBK为内码的繁体。<BR>以后可能还有很多问题要求助你,希望不吝赐教!</DIV>
<DIV>- <B>pengwenming1</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'pengwenming11';
var domain = 'sina.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email pengwenming1">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-03-19 14:53</DIV>
<DIV class=commCont>搞错了,关于繁体转简体的问题,应该反过来才对。</DIV>
<DIV>- <B>shixingdong</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'myjobcn';
var domain = '21cn.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email shixingdong">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-04-05 17:00</DIV>
<DIV
class=commCont>gb->big5转换出来的并非big5码,在浏览器中是gbk繁体,而不是繁体BIG5<BR><BR>有没有什么解决方法?</DIV>
<DIV>- <B>shixingdong</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'myjobcn';
var domain = '21cn.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email shixingdong">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-04-07 12:02</DIV>
<DIV class=commCont>搞了很好,只有将gb码转成Unicode即在BIG5页面中处理转码问题。</DIV>
<DIV>- <B>eq1688</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'eq1688';
var domain = 'sohu.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email eq1688">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-07-09 17:43</DIV>
<DIV class=commCont>你好,
shixingdong<BR>“<BR>gb->big5转换出来的并非big5码,在浏览器中是gbk繁体,而不是繁体BIG5<BR><BR>搞了很好,只有将gb码转成Unicode即在BIG5页面中处理转码问题。”<BR><BR>是怎么在BIG5页面中处理转码问题的才能转成繁体BIG5</DIV>
<DIV>- <B>梦人</B> (
<SCRIPT type=text/javascript>
<!--
var first = 'ma';
var second = 'il';
var third = 'to:';
var address = 'wang.ss';
var domain = '163.com';
document.write('<a href="');
document.write(first+second+third);
document.write(address);
document.write('@');
document.write(domain);
document.write('" title="email 梦人">');
document.write('email<\/a>');
// -->
</SCRIPT>
) 于 2006-10-09 00:15</DIV>
<DIV
class=commCont>你好,zeal。我想用这个做一个判断,在中文简体繁体互相转换的时候,如果在转换为简体(或繁体)过程中,本来就是简体(或繁体)就不转换,这怎么处理?
谢谢!</DIV>
<DIV>- <B>梦人</B> 于 2006-10-09 15:58</DIV>
<DIV class=commCont>谢谢</DIV>
<DIV>- <B>zeal</B> (<A title=www.zeali.net/ href="http://www.zeali.net/"
target=_blank>link</A>) 于 2006-10-09 15:54</DIV>
<DIV class=commCont>to
梦人:一般来说要对字符编码格式进行判断就是根据各种编码两个字节的起止范围作判断,如果符合起止范围就认为是某种字符编码。但理论上来说不同的字符编码有可能会采用同样的字节范围,所以这种方法并不是完全准确的。只有采用unicode大字符集才能够唯一确定某个字符是什么文字。<BR>附上
PHP 4 source 里面对于中文几种常用字符集的范围宏定义,你可以根据这个范围来实现自己的字符编码格式判断方法。<BR><BR>/* Support
for Chinese(BIG5) characters */<BR>#define isbig5head(c) (0xa1<=(uchar)(c)
&& (uchar)(c)<=0xf9)<BR>#define isbig5tail(c) ((0x40<=(uchar)(c)
&& (uchar)(c)<=0x7e) || (0xa1<=(uchar)(c) &&
(uchar)(c)<=0xfe))<BR><BR>/* Support for Chinese(GB2312) characters
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -