📄 index.lxp@lxpwrap=x2183_252ehtm.htm

📁 GUI Programming with Python
💻 HTM
字号:
    <table border="0" cellspacing="0" cellpadding="3" width="100%"><tr><td>    <div align="center" id="bldcontent">      <a href="../default.htm"><img src="../images/opendocs.png" width="63" height="76" border="0"></a>      <br>      <div class="symbol">Your OpenSource Publisher&#153;</div>    </div>      </td></tr></table>    <div align="center" class="author">      	<a href="../products.lxp">Products</a>	&nbsp;|&nbsp;	<a href="../wheretobuy.lxp">Where to buy</a>	&nbsp;|&nbsp;	<a href="../bookstore.lxp">Retailers</a>	&nbsp;|&nbsp;	<a href="../faq.lxp">FAQ</a>	&nbsp;|&nbsp;        <a href="../writeforus.lxp">Write for Us.</a>        &nbsp;|&nbsp;        <a href="#contact">Contact Us.</a>  </div>    <table border="0" cellspacing="3" cellpadding="0" width="100%"><tr><td width="100%">      <div class="content">        <table border="0" cellspacing="2" cellpadding="0" width="100%"><tr><td width="100%">          <div align="center"><H4 CLASS="AUTHOR"><A NAME="AEN5">Boudewijn Rempt</A><br><a href="../../https@secure.linuxports.com/opendocs/default.htm"><img src=odpyqt125.png></a><br>ISBN: 0-97003300-4-4<br><a href="../../https@secure.linuxports.com/opendocs/default.htm">Available from bookstores everywhere or you can order it here.</a><p>You can download the source files for the book <a href="pyqtsrc.tgz">(code / eps) here.</a><hr></div>                    <HTML><HEAD><TITLE>Unicode strings</TITLE><METANAME="GENERATOR"CONTENT="Modular DocBook HTML Stylesheet Version 1.72"><LINKREL="HOME"TITLE="GUI Programming with Python: QT Edition"HREF="book1.htm"><LINKREL="UP"TITLE="String Objects in Python and Qt"HREF="c2029.htm"><LINKREL="PREVIOUS"TITLE="QCString &#8212; simple strings in PyQt"HREF="x2104.htm"><LINKREL="NEXT"TITLE="Python Objects and Qt Objects"HREF="c2341.htm"></HEAD><BODYCLASS="SECT1"BGCOLOR="#FFFFFF"TEXT="#000000"LINK="#0000FF"VLINK="#840084"ALINK="#0000FF"><DIVCLASS="NAVHEADER"><TABLESUMMARY="Header navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><THCOLSPAN="3"ALIGN="center">GUI Programming with Python: QT Edition</TH></TR><TR><TDWIDTH="10%"ALIGN="left"VALIGN="bottom"><A accesskey="P" href="index.lxp@lxpwrap=x2104_252ehtm.htm">Prev</A></TD><TDWIDTH="80%"ALIGN="center"VALIGN="bottom">Chapter 8. String Objects in Python and Qt</TD><TDWIDTH="10%"ALIGN="right"VALIGN="bottom"><A accesskey="N" href="index.lxp@lxpwrap=c2341_252ehtm.htm">Next</A></TD></TR></TABLE><HRALIGN="LEFT"WIDTH="100%"></DIV><DIVCLASS="SECT1"><H1CLASS="SECT1">Unicode strings</A></H1><DIVCLASS="SECT2"><H2CLASS="SECT2">Introduction to Unicode</A></H2><P>All text that is handled by computers        must be encoded. Every letter in a text has to be represented        by a numeric value. For a long time, it was assumed that 7        bits would provide enough values to encode all necessary        letters; this was the basis for the ASCII character set.        However, with the spread of computers all over the world, it        became clear that this was not enough. A whole host of        different encodings were designed, varying from the obscure        (TISCII) to the pervasive (latin-1). Of course, this leads to        problems when you are trying to exchange texts. A        western-european latin-1 user cannot easily read a Russian        koi-8 text on his system. Another problem is that those small,        one-byte, eight-bit character sets don't have room for useful        stuff, such as extensive mathematical symbols. The solution        has been to create a monster character set consisting of at        least 65000 code-points including every possible character        someone might want to use. This is ISO/IED-10646. The Unicode        standard (http://www.unicode.org) is the official        implementation of ISO/IED-10646.</P><P>Unicode is an essential feature of any        modern application. Unicode is mandatory for every e-mail        client, for instance, but also for all XML processing, web        browsers, many modern programming languages, all Windows        applications (such as Word), and KDE 2.0 translation        files.</P><P>Unicode is not perfect, though. Some        programmers, such as Jamie Zawinski of XEmacs and Netscape        fame, lament the extra bytes that Unicode needs &#8212; two        bytes for every character instead of one. Japanese experts        oppose the unification of Chinese characters and Japanese        characters. Japanese characters are derived from Chinese        characters, historically, and even their modern meaning is        often identical, but there are some slight visual differences.        These complainers are often very vociferous, but Unicode is        the best solution we have for representing the wide variety of        scripts humanity has invented.</P><P>There are a few other practical problems        concerning Unicode. Since the character set is so very large,        there are no fonts that include all characters. The best font        available is Microsoft's Arial Unicode, which can be        downloaded for free. The Unicode character set also includes        interesting scripts such as Devanagari, a script where single        letters combine to from complicated ligatures. The total        number of Devanagari letters is fairly small, but the set of        ligatures runs into the hundreds. Those ligatures are not        defined in the character set, but have to be present in fonts.        Scripts like Arabic or Burmese are even more complicated. For        those scripts, special rendering engines have to be written in        order to display a text correctly.</P><P>From version 3, Qt includes capable rendering engines for        a number of scripts, such as Arabic, and promises to include        more. With Qt 3, you can also combine several fonts to form a        more complete set of characters, which means that you no        longer have use have one monster font with tens of thousands        of glyphs.</P><P>The next problem is inputting those        texts. Even with remappable keyboards, it's still a monster        job to support all scripts. Japanese, for instance, needs a        special-purpose input mechanism with dictionary lookups that        decide which combination of sounds must be represented using        Kanji (Chinese-derived characters) or one of the two syllabic        scripts, kana and katakana.</P><P>There are still more complications, that        have to do with sort order, bidirectional text (Hebrew going        from right to left, Latin from left to right) &#8212; then        there are vested problems with determining which language is        the language of preference for the user, which country he is        in (I prefer to write in English, but have the dates show up        in the Dutch format, for instance). All these problems have        their bearing upon programming using Unicode, but are so        complicated that a separate book should be written to deal        with them.</P><P>However, both Python strings and Qt        strings support Unicode &#8212; and both Python and Qt strings        support conversion from Unicode to legacy character sets such        as the wide-spread Latin-1, and vice-versa. As said above,        Unicode is a multi-byte encoding: that means that a single        Unicode character is encoded using <SPAN><ICLASS="EMPHASIS">two</I></SPAN>        bytes. Of course, this doubles memory requirements compared to        single-byte character sets such as Latin-1. This can be        circumvented by encoding Unicode using a variable number of        bytes, known as UTF-8. In this scheme, Unicode characters that        are equivalent to ASCII characters use just one byte, while        other characters take up to three bytes. UTF-8 is a        wide-spread standard, and both Qt and Python support        it.</P><P>I'll first describe the pitfalls of        working with Unicode from Python, and then bring in the Qt        complications.</P></DIV><DIVCLASS="SECT2"><H2CLASS="SECT2">Python and Unicode</A></H2><P>Python actually makes a difference        between Unicode strings and 'normal' strings &#8212; that is,        strings where every byte represents one character. Plain        Python strings are often used as character arrays representing        immutable binary data. In fact, plain strings are semantically        very similar to Java's byte array, or Qt's        <TTCLASS="CLASSNAME">QByteArray</TT> class &#8212; they represent        a simple sequence of bytes, where every byte        <SPAN><ICLASS="EMPHASIS">may</I></SPAN> represent a character, but could also        represent something quite different, not a human readable text        at all.</P><P>Creating a Unicode string is a        bootstrapping problem. Whether you use BlackAdder's Scintilla        editor or another editor, it will probably not support Unicode        input, so you cannot type Chinese characters directly.        However, there are clever ways around this problem: you can        either type hex codes, or construct your strings from other        sources. In the third part of this book we will create a small        but fully functional Unicode editor.</P><DIVCLASS="SECT3"><H3CLASS="SECT3">String literals</A></H3><P>You can create a Unicode string literal          by prefixing the string with the letter          <SPAN><ICLASS="EMPHASIS">u</I></SPAN>, or convert a plain string to Unicode          with the <TTCLASS="FUNCTION">unicode</TT> keyword. You cannot,          however, write Python code using anything but ASCII. If you          look at the following script, you will notice that there is          a function defined in Chinese characters (yin4shua1 means          print), that tries to print the opening words of the Nala          &#8212;, a Sanskrit epos. Python cannot handle this, so all          actual code must be in ASCII.</P><DIVCLASS="MEDIAOBJECT"><P><DIVCLASS="CAPTION"><P>A Python script written in Unicode.</P></DIV></P></DIV><P>Of course, it would be nice if we could          at least type the strings directly in UTF-8, as shown in the          next screenshot:</P><DIVCLASS="MEDIAOBJECT"><P><DIVCLASS="CAPTION"><P>A Python script with the strings written in              Unicode.</P></DIV></P></DIV><P>Unfortunately, this won't work either.          Hidden deep in the bowels of the Python startup process, a          default encoding is set for all strings. This encoding is          used to convert from Unicode whenever the Unicode string has          to be presented to outside world components that don't talk          Unicode, such as <TTCLASS="FUNCTION">print</TT>. By default this          is 7-bits ASCII. Running the script gives the following          error:</P><PRECLASS="SCREEN">boudewijn@maldar:~/doc/opendoc/ch4 &#62; python unicode2.pyTraceback (most recent call last):  File "unicode2.py", line 4, in ?    nala()  File "unicode2.py", line 2, in nala    print u"啶啶膏ムう 啶班ぞ啶啶
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -