📄 str.doc
字号:
+++Date last modified: 05-Jul-1997
TITLE
class str
DESCRIPTION
A simple but highly useful C++ string class
FILES
str.h class str definition
str.cpp class str implementation
AUTHOR
David Nugent
CONTACT
FidoNet 3:632/348,
davidn@csource.pronet.com
PO Box 352,
Doveton, VIC, Australia
Voice +61-3-793-2728
STATUS
Donated to the public domain, no restrictions on any use
SYNOPSYS
class str is a simple yet powerful C++ string class,
providing many forms of conversions (from other base
types) to strings and a large variety of manipulators,
making it very useful as a stand-alone string class, for
output formatting with iostreams, for cheap copying,
concatenation and splitting operations, as a general
purpose class useful in data presentation and line-based
parsing.
GENERAL STRUCTURE
class str is designed to be small, which makes it a
practical data type which can be used in large arrays
even on small memory systems.
In addition, the typical implementation will result in a
much smaller str again if "VIRTUAL_DESTRUCTOR" is not
enabled, which prevents the destructor str::~str() from
being declared virtual, avoiding generation of a vtable
and vtable pointer for the class. Without a vtable
pointer, class str on most implementations is generally
the same size as sizeof() a data pointer, ie. as cheap
and small as a char*. The disadvantage of not making the
destructor virtual is that this places some limitations
on use of derived classes - however these are not severe,
and the main benefits of code reuse are still available
even without a virtual destructor. If a derived class
allocates resources, however, some care should be taken
to avoid upcasts if at all possible to ensure that the
correct destructor is called.
These limitations are circumvented by defining
VIRTUAL_DESTRUCTOR to any non-zero value, with the
disadvantage that an object of class str will (usually)
be twice as big.
Reference String
Class str itself contains a single data member, being a
pointer to an internal "reference string". Reference
strings (embodied in class refstr) contains the actual
string data, and provides a mechanism for cheap copy and
assignment - instead of copying the data each time, more
than one str object is allowed to reference the same
physical data, and delays copying until one of the str
objects is modified, at which time a new refstr object is
created and copied from the old, and only that copy
changed. In many situations, that copy is never
modified, so physically copying of the string data never
becomes necessary, saving both in execution time and memory.
The following diagram shows how 3 str objects share the
same reference string:
str string1("This is a string"); // const char * __ctor
str string2(string1); // str const & __ctor
str string3 = string1; // str const & __ctor
At this point, the relationship of these three objects and
the internal refstr is:
string1
refstr --. refs = 3
string2 \ length = 16
refstr ------ refstr1 size = 32
string3 / data = This is a string\0.....\0
refstr --'
The reference container object contains a counter for the
number of times the object is referenced by the str
wrapper. Any attempt to modify a refstr via any string
object while the number of references exceeds 1 results
in the refstr first being copied and the original left
untouched apart from the reference count being
decremented. Modifications are made on the copy only,
which has a reference count of 1.
If, for example, string3 were to be modified by the
string " 3" being added (concatenated) to it, the diagram
would then look like:
string1
refstr --.
string2 -- refstr1 2/16/32/This is a string\0.....\0
refstr --'
string3
refstr ----- refstr1 1/18/32/This is a string 3\0...\0
Reference strings are a variable sized object. Size
variations of refstr objects are handled via a placement
operator new, which causes an additional amount of memory
to be allocated for the string data itself. Members of
class refstr (refs, length size, flags) coexist in memory
contiguous with the data itself, which is appended to the
end. Reference to and calculation of offsets of the data
itself is handled by wrapper functions in class refstr.
The use of reference strings in this case provides a data
type which is almost always cheaper than using references
for parameter passing. For example, given two functions:
void hello1(str x);
void hello2(str & x);
str mystring("Hello world");
hello1(mystring);
hello2(mystring);
While the actual call to function hello2() is slightly
cheaper since it involves passing reference only, the
code within hello1 will not have to deal with the
additional indirection. Also the parameter passed to
hello1() may be freely modified without affecting the
original string even if both variables (the original and
the copy passed on the stack) initially reference the
same physical reference string.
Pre-allocated size
Note that by default, strings have a pre-allocated
internal size of at least STDLEN, which is defined in
str.cpp, the implementation file. This can be overridden
as desired using the additional optional parameter for
most of the class constructors. As shipped, the default
size of the memory allocated for the data in a refstr
object, unless overridden by the constructor, is 32
bytes. This amount is automatically grown to accommodate
insertions and concatenations (see the internal function
_chksize() for details).
Preallocating memory for strings leads to efficiencies in
string manipulation, avoiding having to call for
reallocation for trivial modification.
Conversions
To accommodate conversion of a str object to a C style
'string' (a variable length char array with a NUL
terminator), class refstr is maintained at least one byte
larger than the amount required to contain the exact
string length. This fixes the overhead of adding the NUL
terminator as required rather than the overhead being
dependent on allowing the refstr object to become larger
in order to accommodate it if needed. Note that although
this additional byte is maintained, the string is NOT
NECESSARILY NUL TERMINATED, and that is exactly why the
c_str() member assures that it is. Dealing with the data
in this way and maintaining a separate length variable in
class refstr tends to eliminate any possibility of
continually scanning the string to determine length, as
is typical of a lot of C and C++ code which uses char*'s.
The member function to obtain the string length is a very
cheap operation.
Conversion to char const *
Class str provides no automatic type conversion operators
which are a common feature of many string classes. This
was considered far too dangerous to implement, as it can
occasionally cause invalid memory access - modification
or reading of string data which no longer 'exists'.
Instead, this functionality was moved to memory c_str(),
which must be explicitly called and yet still has a few
caveats. See notes under the explanation of c_str() below.
Maximum string size
For compactness, the maximum size of a string is fixed at
32K, even on 32-bit systems. This is a design feature
intended to meet the intended use of this class.
Manipulation of larger buffers is best done with classes
designed for this; the algorithms incorporated in class
str are not at all optimised for large text buffer
manipulation.
Binary strings
Because class str is not dependent on a terminating NUL,
it can be used for manipulation of binary strings. Note,
however, that any conversion to char const * via c_str()
will negate this advantage if the string contains a NUL -
a pointer to the string data may still be obtained by
c_str() or (since the NUL is not expected) c_ptr() which
is slightly cheaper.
DESCRIPTION OF MEMBERS
PUBLIC INTERFACE
Class str was written primarily for practical use where
strings are most often used in other languages - data
presentation. After all, a string is not a computational
entity, but (usually) one which contains data that is
manipulated and presented in some way to a human, or at
least readable by machine or human.
Consequently, the emphasis of members included in the
class provide direct conversion of built-in types to
strings via a large number of constructors. For
conversion from other classes, it is suggested that an
"operator str() const" be implemented for the class,
allowing a string to be directly created from it.
Formatting output using class str, even with iostreams,
is much more easily done with class str than with
somewhat more clumsy iomanipulators. Padding, filling and
justification functions are all provided and are easy,
intuitive and safe to use, even with temporaries.
Moreover, str's are perfect for formatting before
insertion into an ostream - some manipulators, such as
width (via setw()), operate only on the next insertion,
and formatting within a string variable ensures that one
entire object is inserted in one insertion, therefore
respecting the state of the stream. Similarly, stream
justification, fill and other characteristics do not need
to be saved, set and restored around insertion operations.
Constructors
class str defines a default constructor, providing
convenient support for allocation of arrays. All other
constructors provide some form of conversion, including
conversion of char*'s, all built-in integral types
(short, int, long and unsigned versions thereof), and all
forms of char. Note that char is not handled as an
integral. Conversion of integral types allows
specification of a radix, so support for non- decimal
numeric conversions other than base 10 are fully supported.
str (void);
Default constructor. Allocates a zero-length string
with an internal size determined by the pre-allocated
length. In almost all cases it is more efficient to
use one of the initialising constructors if possible.
str mystring;
str (char const * s, short len =-1);
str (unsigned char const * s, short len =-1);
str (signed char const * s, short len =-1);
These constructors provide conversion from C strings.
The length parameter allows extraction of only a
portion of a string (sub- string). The default value
of -1 assumes that the source string is NUL terminated.
str mystring("Hello world!");
str mystring = "Hello world!");
str mystring("Hello world!", 5); // Contains 'Hello' only
str (int val, int radix =10);
str (unsigned int val, int radix =10);
str (short val, int radix =10);
str (unsigned short val, int radix =10);
str (long val, int radix =10);
str (unsigned long val, int radix =10);
These provide automatic conversion for all integral
types with a default radix of 10. Negative values of
signed types will cause the resulting string to have
a leading '-' sign.
For a non-decimal radix, no indication of the
number's base is automatically inserted. If you wish
to use "0x" for hexadecimal numbers, for example, you
will need to use one of the insert() members. When
using a non-decimal radix, it is highly recommended
that one of the unsigned converters be used to
prevent generation of the sign prefix.
str mystring(10999); // result '10999'
str mystring(255,16); // result 'ff'
str mystring = -14587; // result '-14587'
str mystring = (unsigned)-1;// implementation dependent
str (char c);
str (unsigned char c);
str (signed char c);
These constructors are not integral conversions but
convert a single character into a string with a
length of 1.
str mystring('g'); // 'g'
str mystring = 'k' // 'k'
str (str const & s);
This is the copy constructor. Note that this causes
the new string to be initialised with the same refstr
as contained by the string being copied, and so is
the cheapest constructor available.
str mystring1 = "Hello world!";
str mystring2 = mystring1; // 'Hello world!'
str mystring3(mystring1); // 'Hello world!'
~str (void);
Class destructor. This will deallocate the contained
reference string if it is the only string which
references it, otherwise only the reference strings
'reference count' is decremented.
str & clear(void);
This member provides the ability to quickly clear the
contents of a string. Not that if the contained
reference string is not referenced by any other str
object, the length is reduced to zero but the string
size (size of the actual refstr) is untouched. This
makes this function suitable when using a string as a
temp variable - it will grow to accommodate the
largest item placed into it and therefore not require
constant reallocation in a loop for example, except
where the items are made progressively bigger.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -