charset.txt

来自「汇编编程艺术」· 文本 代码 · 共 485 行 · 第 1/2 页

TXT
485
字号
Character Set Routines
----------------------

The character set routines let you deal with groups of characters as a set
rather than a string.  A set is an unordered collection of objects where
membership (presence or absence) is the only important quality.  The stdlib
set routines were designed to let you quickly check if an ASCII character is
in a set, to quickly add characters to a set or remove characters from a set.
These operations are the ones most commonly used on character sets.  The
other operations (like union, intersection, difference, etc.) are useful, but
are not as popular as the former routines.  Therefore, the data structure
has been optimized for sets to handle the membership and add/delete operations
at the slight expense of the others.

Character sets are implemented via bit vectors.  A "1" bit means that an item
is present in the set and a "0" bit means that the item is absent from the
set.  The most common implementation of a character set is to use thirty-two 
consecutive bytes, eight bytes per, giving 256 bits (one bit for each char-
acter in the character set).  While this makes certain operations (like 
assignment, union, intersection, etc.) fast and convenient, other operations
(membership, add/remove items) run much slower.  Since these are the more 
important operations, a different data structure is used to represent sets.
A faster approach is to simply use a byte value for each item in the set.  
This offers a major advantage over the thirty-two bit scheme:  for operations 
like membership it is very fast (since all you have got to do is index into 
an array and test the resulting value).  It has two drawbacks:  first, oper-
ations like set assignment, union, difference, etc., require 256 operations 
rather than thirty-two; second, it takes eight times as much memory.

The first drawback, speed, is of little consequence.  You will rarely use the
the operations so affected, so the fact that they run a little slower will be
of little consequence.  Wasting 224 bytes is a problem, however.  Especially
if you have a lot of character sets.

The approach used here is to allocate 272 bytes.  The first eight bytes con-
tain bit masks, 1, 2, 4, 8, 16, 32, 64, 128.  These masks tell you which bit
in the following 264 bytes is associated with the set.  This facilitates 
putting eight sets into 272 bytes (34 bytes per character set).  This provides
almost the speed of the 256-byte set with only a two byte overhead.  In the
stdlib.a file there is a macro that lets you define a group of character
sets:  set.  The macro is used as follows:

	set set1, set2, set3, ... , set8

You must supply between one and eight labels in the operand field.  These are
the names of the sets you want to create.  The set macro automatically 
attaches these labels to the appropriate mask bytes in the set.  The actual
bit patterns for the set begin eight bytes later (from each label).  There-
fore, the byte corresponding to chr(0) is staggered by one byte for each
set (which explains the other eight bytes needed above and beyond the 256 
required for the set).  When using the set manipulation routines, you should
always pass the address of the mask byte (i.e., the seg/offset of one of the 
labels above) to the particular set manipulation routine you are using. 
Passing the address of the structure created with the macro above will 
reference only the first set in the group.

Note that you can use the set operations for fast pattern matching appli-
cations.  The set membership operation for example, is much faster that the 
strspan routine found in the string package.  Proper use of character sets
can produce a program which runs much faster than some of the equivalent
string operations.


Note: there is a special include file in the INCLUDE directory, STDSETS.A,
which contains the bit definitions for eight commonly-used character sets:
Alpha (upper and lower case alphabetics), lower (lower case alphabetics),
upper (upper case alphabetics), digits ("0".."9"), xdigits (hexadecimal
digits: "0"-"9", 'a'-'z', and 'A'-'Z'), alphanum (upper/lower case alpha
and digits), whitespace (spaces, tabs, carriage returns, and linefeeds),
and delimiters (whitespace plus ",", ";", "<", ">", and "|").

If you want to use this standard character set in your program you must
include the STDSETS.A file in an appropriate (data) segment.  Note that
including STDLIB.A or CHARSETS.A will not give the standard sets.  You must
explicitly place an include STDSETS.A in your program to have access to
these sets.


Routine:  Createsets
--------------------

Category:             Character Set Routine

Registers on Entry:   no parameters passed

Registers on return:  ES:DI - pointer to eight sets

Flags affected:       Carry = 0 if no error. Carry = 1 if insufficient
		      memory to allocate storage for sets.

Example of Usage:
		      Createsets
		      jc      NoMemory
		      mov     word ptr SetPtr,   di
		      mov     word ptr SetPtr+2, es

Description:  Createsets allocates 272 bytes on the heap.   This is sufficient
	      room for eight character sets.  It then initializes the first
	      eight bytes of this storage with the proper mask values for
	      each set.  Location es:0[di] gets set to 1, location es:1[di]
	      gets 2, location es:2[di] gets 4, etc.  The Createsets routine
	      also initializes all of the sets to the empty set by clearing
	      all the bits to zero.

Include:              stdlib.a or charsets.a


Routine:  EmptySet
------------------

Category:             Character Set Routine

Registers on Entry:   ES:DI - pointer to first byte of desired set

Registers on return:  None

Flags affected:	      None

Example of Usage:
		      les     di,  SetPtr
		      add     di,  3          ; Point at 4th set in group.
		      Emptyset


Description:  Emptyset clears out the bits in a character set to zero
	      (thereby setting it to the empty set).  Upon entry, es:di must
	      point at the first byte of the character set you want to clear.
	      Note that this is not the address returned by Createsets.  The
	      first eight bytes of a character set structure are the
	      addresses of eight different sets.  ES:DI must point at one of
	      these bytes upon entry into Emptyset.

Include:              stdlib.a or charsets.a


Routine:  Rangeset
------------------

Category:             Character Set Routine

Registers on entry:   ES:DI (contains the address of the first byte of the set)
		      AL    (contains the lower bound of the items)
		      AH    (contains the upper bound of the items)

Registers on return:  None

Flags affected:       None

Example of Usage:
		      lea di, SetPtr
		      add di, 4
		      mov al, 'A'
		      mov ah, 'Z'
		      rangeset


Description:  This routine adds a range of values to a set with ES:DI as the
	      pointer to the set, AL as the lower bound of the set, and
	      AH as the upper bound of the set (AH has to be greater than
	      AL, otherwise, there will an error).

Include:              stdlib.a or charsets.a


Routine:  Addstr (l)
--------------------

Category:             Character Set Routine

Registers on Entry:   ES:DI- pointer to first byte of desired set
		      DX:SI- pointer to string to add to set (Addstr only)
		      CS:RET-pointer to string to add to set (Addstrl only)

Registers on Return:  None

Flags Affected:       None

Example of Usage:
		      les     di, SetPtr
		      add     di, 1           ;Point at 2nd set in group.
		      mov     dx, seg CharStr ;Pointer to string
		      lea     si, CharStr     ; chars to add to set.
		      addstr                  ;Union in these characters.
;
		      les     di, SetPtr      ;Point at first set in group.
		      addstrl
		      db      "AaBbCcDdEeFf0123456789",0
;


Description:  Addstr lets you add a group of characters to a set by
	      specifying a string containing the characters you want in
	      the set.  To Addstr you pass a pointer to a zero-terminated
	      string in dx:si.  Addstr will add (union) each character
	      from this string into the set.

	      Addstrl works the same way except you pass the string as
	      a literal string constant in the code stream rather than
	      via ES:DI.

Include:              stdlib.a or charsets.a


Routine:  Rmvstr (l)
--------------------


Category:             Character Set Routine


Registers on entry:   ES:DI contains the address of first byte of a set
		      DX:SI contains the address of string to be removed
			     from a set (Rmvstr only)
		      CS:RET pointer to string to add to set (Rmvstrl only)


Registers on return:  None


Flags affected:       None


Example of Usage:
		      les 	di, SetPtr
		      mov 	dx, seg CharStr
		      lea 	si, CharStr
		      rmvstr

		      mov 	dx, seg CharStr
		      lea 	si, CharStr
		      rmvstrl
		      db      	"ABCDEFG",0


Description:  This routine is to remove a string from a set with ES:DI
	      pointing to its first byte, and DX:SI pointing to the
	      string to be removed from the set.

	      For Rmvstrl, the string of characters to remove from the
	      set follows the call in the code stream.

Include:              stdlib.a or charsets.a


⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?