⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 7strtab.html

📁 C ++ in action
💻 HTML
字号:
<html>
<head>
	<title>String Table</title>
    <meta  name="description" content="Implementing a string table in C++">
    <meta name="keywords" content="string, table, array, symbol, buffer">
	<link rel="stylesheet" href="../../rs.css">
</head>

<body background="../../images/margin.gif" bgcolor="#FFFFDC">

<!-- Main Table -->
<table cellpadding="6">
    <tr>
    <td width="78">
	&nbsp;
    <td>

<h4>String Table</h4>

<p>A string table maps integers into strings and strings into integers. The first mapping is easily implemented using an array of strings (pointers, or even better, offsets into a buffer). The second one is more difficult to implement efficiently. Our goal is to have constant time access both ways. It means that, independently of how many strings there are in the table (within some reasonable limits), the mappings of an <var>int</var> into a string and a string into an <var>int</var> take, on average, constant time (in the programmer speak O(1)). Notice that in the simplest implementation梘oing through the list of strings and comparing each with the string to be mapped梩he execution takes longer and longer as the list grows (we say, it grows like O(N)). However, there is a data structure called the <i>hash table</i> that can accomplish constant time mapping. We will use it to implement an efficient string table.

<p>With this performance goal in mind, we are ready to design the string table. Since we want to be able to add strings to it, we will need a large buffer to store them. The <var>StringBuffer</var> object stores a string and returns an offset at which the string  can be found. This offset, in turn, is stored in another data structure, an array of offsets, that maps integer id抯 to string offsets. Each string is thus assigned an id, which is simply the index into the array of offsets. 
<p>When adding a new string, the appropriate entry is also added to the hash table. (For the moment let抯 treat the hash table as a black box that maps strings into short lists of id抯.)

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>const int idNotFound = -1;

const int maxStrings = 100;
// String table maps strings to ints
// and ints to strings

class StringTable
{
public:
      StringTable ();
      int ForceAdd (char const * str);
      int Find (char const * str) const;
      char const * GetString (int id) const;
private:
      HTable         _htab;  // string -&gt; short list of id抯
      int            _offStr [maxStrings]; // id -&gt; offset
      int            _curId;
      StringBuffer   _strBuf; // offset -&gt; string
};</pre>
</table><!-- End Code -->

<p>The translation from an id into a string is done in the <var>GetString</var> method, the translation from the string into an id is done in <var>Find</var>, and the addition of new strings without looking for duplicates is done in <var>ForceAdd</var>.

<p><img src="images/Image11.gif" width=330 height=180>
<p class=caption>Figure 11. The hash table stores indexes into the offset array. The offset array stores indexes into the string buffer. The string buffer stores strings.</p>

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>StringTable::StringTable ()
    : _curId (0)
{}

int StringTable::ForceAdd (char const * str)
{
      int len = strlen (str);
      // is there enough space?
      if (_curId == maxStrings || !_strBuf.WillFit (len))
      {
            return idNotFound;
      }
      // point to the place where the string will be stored
      _offStr [_curId] = _strBuf.GetOffset ();
      _strBuf.Add (str);
      // add mapping to hash table
      _htab.Add (str, _curId);
      ++_curId;
      return _curId - 1;
}</pre>
</table><!-- End Code -->

<p>The hash table (still a black box for us) finds a short list of entries, one of them containing the id of the given string. We search this list in order to find the correct id.

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>int StringTable::Find (char const * str) const
{
      // Get a short list from hash table
      List const &amp; list = _htab.Find (str);
      // Iterate over this list
      for (Link const * pLink = list.GetHead ();
           pLink != 0; 
           pLink = pLink-&gt;Next ())
      {
            int id = pLink-&gt;Id ();
            int offStr = _offStr [id];
            if (_strBuf.IsEqual (offStr, str))
                  return id;
      }
      return idNotFound;
}

// map integer into string. Must be valid id

char const * StringTable::GetString (int id) const
{
      assert (id &gt;= 0);
      assert (id &lt; _curId);
      int offStr = _offStr [id];
      return _strBuf.GetString (offStr);
}</pre>
</table><!-- End Code -->


<h4>String Buffer</h4>

<p>Our string buffer is implemented as an array of characters. We keep copying null terminated strings into this array until we run out of space. The variable <var>_curOffset</var> is an index into the array and it always points at the next free area where a string can be copied. It is initialized to zero, thus pointing at the beginning of the buffer. Before adding a string we make sure that it will fit in the remaining space. Adding a string means copying it to the place where <var>_curOffset</var> points to, and moving <var>_curOffset</var> past the string抯 terminating null. <var>GetOffset</var> returns the current offset, which can be later used to access the string about to be copied. <var>IsEqual</var> compares the string at a given offset with a given string, and <var>GetString</var> returns a <var>const</var> string given the offset.

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>const int maxBufSize=500;

class StringBuffer
{
public:
      StringBuffer () : _curOffset (0) {}
      bool WillFit (int len) const
      {
            return _curOffset + len + 1 &lt; maxBufSize;
      }
      void Add (char const * str)
      {
            // strcpy (_buf + _curOffset, str);
            strcpy (&amp;_buf [_curOffset], str);
            _curOffset += strlen (str) + 1;
      }
      int GetOffset () const
      {
            return _curOffset;
      }
      bool IsEqual (int offStr, char const * str) const
      {
            // char const * strStored = _buf + offStr;
            char const * strStored = &amp;_strBuf [offStr];
            // strcmp returns 0 when strings are equal
            return strcmp (str, strStored) == 0;
      }
      char const * GetString (int offStr) const
      {
            //return _buf + offStr;
            return &amp;_buf [offStr];
      }
private:
      char    _buf [maxBufSize];
      int     _curOffset;
};</pre>
</table><!-- End Code -->

<p><img src="images/Image12.gif" width=252 height=96>

<p class=caption>Figure 12. Current offset in the string buffer.</p>

<p>There are two ways of creating a pointer that points to a particular entry in an array. One can take the address of the nth element

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>p = &amp;arr [n];  // address of the n抰h element</pre>
</table><!-- End Code -->

or one can use the fact that one can add an integer to a pointer in order to move it n entries ahead

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>p = arr + n;  // pointer plus integer</pre>
</table><!-- End Code -->

<p>Both are correct and produce the same code-- in the example above, pointer equivalents are shown as comments. The array notation seems less confusing.
<p>I also used a generalized assignment operator += that adds the right hand side expression to the left hand side variable. For instance,

<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>n += i;</pre>
</table><!-- End Code -->

is equivalent to
<!-- Code --><table width="100%" cellspacing=10><tr>	<td class=codetable>
<pre>n = n + i;</pre>
</table><!-- End Code -->

<p>The code produced is of course the same (although originally the += operator was introduced in C as an optimization), but the shorthand remains useful, once you get used to it

<p>Finally, I used some of the string functions from the standard library. The header file that contains their declaration is called <var>cstring</var>. You can find a detailed description of these functions in your compiler抯 help.



</table>
<!-- End Main Table -->
</body>
</html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -