article3.htm

来自「The code for this article was written fo」· HTM 代码 · 共 315 行 · 第 1/5 页
HTM
315 行
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML><HEAD><TITLE>Article 3. Strings the OLE Way</TITLE></HEAD>
<!--DocHeaderStart-->
<BODY bgcolor="#ffffff" text="#000000" leftmargin=0 topmargin=0 link="#0080C0" vlink="gray">
<BR>
<table width=100% border=0 cellspacing=0 cellpadding=0>
<TR>
<TD WIDTH="10" VALIGN=TOP>&nbsp;</TD>
<TD VALIGN="TOP">


<!--DocHeaderEnd-->
<!-- This is a PANDA Generated HTML file. The source is a WinWord Document. -->
  
<P><h1>Article 3. Strings the OLE Way</h1>
<P>Bruce McKinney
<P><!--DATE-->April 18, 1996
<P><!--/DATE-->
<P><h2>Introduction</h2>
<P>The difference between Visual Basic&#174; strings and Visual C++&#174; strings is the difference between &quot;I'll do it&quot; and &quot;You do it.&quot; The C++ way is fine for who it's for, but there aren't many programmers around anymore who get a thrill out of allocating and destroying their own string buffers. In fact, most C++ class libraries (including the Microsoft&#174; Foundation Classes, or MFC) provide string classes that work more or less on the Basic model, which is similar to the model of Pascal and FORTRAN.
<P>When you manage an array of bytes (or an array of books or beer bottles or babies), there are two ways of maintaining the lengths. The marker system puts a unique marker at the end of the array. Everything up to the marker is valid. The count system adds a special array slot containing the number of elements. You have to update the count every time you resize the array. Both systems have their advantages and disadvantages. The marker system assumes you can find some unique value that will never appear in the array. The count system requires tedious bookkeeping to keep the count accurate. 
<P>The C language and most of its offspring uses the marker system for storing strings, with the null character as the marker. All the other languages I know use the count system. You might argue that the majority indicates the better choice, but even if you buy that, C still gets the last laugh. Many of the leading operating systems of the world (all the flavors of Unix&#174;, Windows&#174;, and OS/2&#174;, for example) expect strings passed to the system to be null-terminated. As a result, languages such as Pascal and FORTRAN support a special null-terminated string type for passing strings to the operating system. Basic doesn't have a separate type for null-terminated strings, but it has features that make passing null-terminated strings easy. 
<P>As a language-independent standard, OLE can't afford to take sides. It must accommodate languages in which null is not a special character, but it must also be able to output null-terminated strings for its host operating system. More importantly, OLE recognizes that requiring the operating system to manage strings is inherently more stable and reliable in a future computing world where strings may be transferred across process, machine, and eventually Internet boundaries. I've been told that the name <i>BSTR</i> is a compression of Basic STRing, but in fact a BSTR looks a lot more like a Pascal string than like the strings Basic old-timers remember.
<P>In any case, C++ programmers have some unlearning to do when it comes to writing strings for OLE. But before you can get into BSTR details, you need to clearly understand the difference between Unicode&#174; and ANSI strings. 
<P><h2>Unicode vs. ANSI</h2>
<P>Stringwise, we are cursed to live in interesting times. The world according to Microsoft (and many other international companies) is moving from ANSI to Unicode characters, but the transition isn't exactly a smooth one. 
<P>Most of the Unicode confusion comes from the fact that we are in the midst of a comprehensive change in the way characters are represented. The old way uses the ANSI character set for the first 256 bytes, but reserves some characters as double-byte character prefixes so that non-ANSI character sets can be represented. This is very efficient for the cultural imperialists who got there first with Latin characters, but it's inefficient for those who use larger character sets. Unicode represents all characters in two bytes. This is inefficient for the cultural imperialists (although they still get the honor of claiming most of the first 128 characters with zero in the upper byte), but it's more efficient (and more fair) for the rest of the world. 
<P><h3>Different Views of Unicode</h3>
<P>Eventually, everybody will use Unicode, but nobody seems to agree on how to deal with the transition. 
<P><UL><LI> Windows 3.
<i>x</i>--Doesn't know a Unicode from a dress code, and never will.
<LI> 16-bit OLE--Ditto.
<LI> Windows NT&#174;--Was written from the ground up first to do the right thing (Unicode) and secondly to be compatible (ANSI). All strings are Unicode internally, but Windows NT also completely supports ANSI by translating internal Unicode strings to ANSI strings at run time. Windows NT programs that use Unicode strings directly can be more efficient by avoiding frequent string translations, although Unicode strings take about twice as much data space. 
<LI> Windows 95--Uses ANSI strings internally. Furthermore, it doesn't support Unicode strings even indirectly in most contexts--with one big exception. 
<LI> 32-bit OLE--Was written from the ground up to do the right thing (Unicode) and doesn't do ANSI. The OLE string types--OLESTR and BSTR--are Unicode all the way. Any 32-bit operating system that wants to do OLE must have at least partial support for Unicode. Windows 95 has just enough Unicode support to make OLE work.
<LI> Visual Basic--The designers had to make some tough decisions about how they would represent strings internally. They might have chosen ANSI, because it's the common subset of Windows 95 and Windows NT, and converted to Unicode whenever they needed to deal with OLE. But since Visual Basic 4.0 is OLE inside and out, they chose Unicode as the internal format, despite potential incompatibilities with Windows 95. The Unicode choice caused many problems and inefficiencies both for the developers of Visual Basic and for Visual Basic developers--but the alternative would have been worse. 
<LI> The Real World--Most existing data files use ANSI. The .WKS, .DOC, .BAS, .TXT, and most other standard file formats use ANSI. If a system uses Unicode internally but needs to read from or write to common data formats, it must do Unicode-to-ANSI conversion. Someday there will be Unicode data file formats, but today they're pretty rare. </UL>What does this mean for you? It means you must make choices about any program you write:
<P><UL><LI> If you write using Unicode internally, your application will run only on Windows NT, but it will run faster. Everything is Unicode, inside and out. There are no string translations--except when you need to write string data to standard file formats that use ANSI. An application written this way won't be out-of-date when some future iteration of Windows 9<i>x</i> gets Unicode.
<LI> If you write using ANSI internally, your application will run on either Windows NT or Windows 95, but it will run slower under Windows NT because there are a lot of string translations going on in the background. An application written this way will someday be outdated when the whole world goes Unicode, but it may not happen in your lifetime.
</UL>The obvious choice for most developers is to use the ANSI version because it works right now for all 32-bit Windows platforms. But I'd like to urge you to take a little extra time to build <b>both</b> versions.
<P>If you choose to write your application using both ANSI and Unicode, Win32&#174; and the C run-time library both provide various types and macros to make it easier to create portable programs from the same source. To use them, define the symbol _UNICODE for your Unicode builds and the symbol _MBCS for your ANSI builds. The samples already have these settings for the Microsoft Developer Studio. 
<P><blockquote><b>Note:</b>     As far as this article is concerned, there is no difference between double-byte character strings--DBCS--and multi-byte character strings--MBCS. Similarly, &quot;wide character&quot; and &quot;Unicode&quot; are synonymous in the context of this article.</blockquote>
<P><h3>A WCHAR Is a wchar_t Is an OLECHAR </h3>
<P>Just in case you're not confused enough about ANSI and Unicode strings, everybody seems to have a different name for them. Furthermore, there's a third type of string called a single-byte character string (SBCS), which we will ignore in this article. 
<P>In the Win32 API, ANSI normally means MBCS. The Win32 string functions (<b>lstrlenA</b>, <b>lstrcpyA</b>, and so on) assume multi-byte character strings, as do the ANSI versions of all application programming interface (API) functions. You also get Unicode versions (<b>lstrlenW</b>, <b>lstrcpyW</b>). Unfortunately, these aren't implemented in Windows 95, so you can't use them on BSTRs. Finally, you get generic macro versions (<b>lstrlen</b>, <b>lstrcpy</b>) that depend on whether you define the symbol UNICODE.
<P>The C++ run-time library is even more flexible. For each string function, it supports a single-byte function (<b>strlen</b>); a multi-byte function (<b>_mbslen</b>); a wide character (<b>wcslen</b>), and a generic macro version (<b>_tcslen</b>) that depends on whether you define _UNICODE, _MBCS, or _SBCS. Notice that the C run-time library tests _UNICODE while Win32 tests UNICODE. We get around this by defining these to be equivalent in OLETYPE.H.
<P>Win32 provides the <b>MultiByteToWideChar</b> and <b>WideCharToMultiByte</b> functions for converting between ANSI and Unicode. The C++ run-time library provides the <b>mbstowcs</b> and <b>wcstombs</b> functions for the same purpose. The Win32 functions are more flexible, but not in any way that matters for this article. We'll use the simpler run-time versions. 
<P>Types also come in Unicode and ANSI versions, but to add to the confusion, OLE adds its own types to those provided by Win32 and ANSI. Here are some of the types and type coercion macros you need to be familiar with: 
<P>
<TABLE WIDTH=87% BORDER=1 CELLPADDING=5 CELLSPACING=0>
<TR VALIGN=TOP BGCOLOR="#DDDDDD">
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2><b>Type</b></FONT></TD>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>Description</FONT></TD></TR>
<TR VALIGN=TOP>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>char</FONT></TD>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>An 8-bit signed character (an ANSI character).
</FONT></TD></TR>
<TR VALIGN=TOP>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>wchar_t</FONT></TD>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>A typedef to a 16-bit unsigned short (a Unicode character). </FONT></TD></TR>
<TR VALIGN=TOP>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>CHAR</FONT></TD>
<TD><font face="Verdana, Arial, Helvetica, Sans-Serif" SIZE=2>The Win32 version of char.
article3.htm - 源码说明

本页面展示了「The code for this article was written for version 1.0 of the Active Template Library (ATL). The cu」中的 article3.htm 源码文件，采用 HTM 编程语言编写，共 315 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与The相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?