📄 chap37.htm
字号:
<h1><a name="0a1f_1d43">37.3 The set-covering problem<a name="0a1f_1d43"></h1><P>
The set-covering problem is an optimization problem that models many resource-selection problems. It generalizes the NP-complete vertex-cover problem and is therefore also NP-hard. The approximation algorithm developed to handle the vertex-cover problem doesn't apply here, however, and so we need to try other approaches. We shall examine a simple greedy heuristic with a logarithmic ratio bound. That is, as the size of the instance gets larger, the size of the approximate solution may grow, relative to the size of an optimal solution. Because the logarithm function grows rather slowly, however, this approximation algorithm may nonetheless give useful results.<P>
<a name="0a1f_1d3f"><a name="0a1f_1d40"><a name="0a1f_1d41">An instance <img src="974_a.gif"> of the <I><B>set-covering problem</I></B> consists of a finite set <I>X </I>and a family <img src="974_b.gif"> of subsets of <I>X</I>, such that every element of <I>X</I> belongs to at least one subset in <img src="974_c.gif">:<P>
<img src="974_d.gif"><P>
<a name="0a1f_1d42">We say that a subset <img src="974_e.gif"> <I><B>covers</I></B> its elements. The problem is to find a minimum-size subset <img src="974_f.gif"> whose members cover all of <I>X</I>:<P>
<img src="974_g.gif"><P>
<h4><a name="0a1f_1d44">(37.8)<a name="0a1f_1d44"></sub></sup></h4><P>
We say that any <img src="974_h.gif"> satisfying equation (37.8) <I><B>covers</I></B> <I>X</I>. Figure 37.3 illustrates the problem.<P>
The set-covering problem is an abstraction of many commonly arising combinatorial problems. As a simple example, suppose that <I>X</I> represents a set of skills that are needed to solve a problem and that we have a given set of people available to work on the problem. We wish to form a committee, containing as few people as possible, such that for every requisite skill in <I>X</I>, there is a member of the committee having that skill. In the decision version of the set-covering problem, we ask whether or not a covering exists with size at most <I>k</I>, where <I>k</I> is an additional parameter specified in the problem instance. The decision version of the problem is NP-complete, as Exercise 37.3-2 asks you to show.<P>
<img src="975_a.gif"><P>
<h4><a name="0a1f_1d45">Figure 37.3 An instance <img src="975_b.gif"> of the set-covering problem, where X consists of the 12 black points and <img src="975_c.gif">. A minimum-size set cover is <img src="975_d.gif">. The greedy algorithm produces a cover of size 4 by selecting the sets S<SUB>1</SUB>,S<SUB>4</SUB>,S<SUB>5</SUB>, and S<SUB>3</SUB> in order.<a name="0a1f_1d45"></sub></sup></h4><P>
<h2>A greedy approximation algorithm</h2><P>
<a name="0a20_1d43">The greedy method works by picking, at each stage, the set <I>S</I> that covers the most remaining uncovered elements.<P>
<img src="975_e.gif"><P>
In the example of Figure 37.3, <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-COVER</FONT> adds to <img src="975_f.gif"> the sets <I>S</I><SUB>l</SUB>, <I>S</I><SUB>4</SUB>, <I>S</I><SUB>5</SUB>, <I>S</I><SUB>3</SUB> in order.<P>
The algorithm works as follows. The set <I>U</I> contains, at each stage, the set of remaining uncovered elements. The set <img src="975_g.gif"> contains the cover being constructed. Line 4 is the greedy decision-making step. A subset <I>S </I>is chosen that covers as many uncovered elements as possible (with ties broken arbitrarily). After <I>S</I> is selected, its elements are removed from <I>U</I>, and <I>S</I> is placed in <img src="975_h.gif">. When the algorithm terminates, the set <img src="975_i.gif"> contains a subfamily of <img src="975_j.gif"> that covers <I>X</I>.<P>
The algorithm <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> can easily be implemented to run in time polynomial in <img src="976_a.gif">. Since the number of iterations of the loop on lines 3-6 is at most min <img src="976_b.gif">, and the loop body can be implemented to run in time <img src="976_c.gif">, there is an implementation that runs in time <img src="976_d.gif">. Exercise 37.3-3 asks for a linear- time algorithm.<P>
<P>
<h2>Analysis</h2><P>
We now show that the greedy algorithm returns a set cover that is not too much larger than an optimal set cover. For convenience, in this chapter we denote the <I>d</I>th harmonic number <img src="976_e.gif"> (see Section 3.1) by <I>H</I>(<I>d</I>) .<P>
<a name="0a21_0001">Theorem 37.4<a name="0a21_0001"><P>
<FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> has a ratio bound<P>
<img src="976_f.gif"><P>
<I><B>Proof </I></B>The proof proceeds by assigning a cost to each set selected by the algorithm, distributing this cost over the elements covered for the first time, and then using these costs to derive the desired relationship between the size of an optimal set cover <img src="976_g.gif"> and the size of the set cover <img src="976_h.gif"> returned by the algorithm. Let <I>S<SUB>i </I></SUB>denote the <I>i</I>th subset selected by <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT>; the algorithm incurs a cost of 1 when it adds <I>S<SUB>i</I></SUB> to <img src="976_i.gif">. We spread this cost of selecting <I>S<SUB>i</I></SUB> evenly among the elements covered for the first time by <I>S<SUB>i</I></SUB>. Let <I>c<SUB>x</I></SUB> denote the cost allocated to element <I>x</I>, for each <I>x</I> <img src="../images/memof12.gif"> <I>X</I>. Each element is assigned a cost only once, when it is covered for the first time. If <I>x</I> is covered for the first time by S<I><SUB>i</I></SUB>, then<P>
<img src="976_j.gif"><P>
The algorithm finds a solution <img src="976_k.gif"> of total cost <img src="976_l.gif">, and this cost has been spread out over the elements of <I>X</I>. Therefore, since the optimal cover <img src="976_m.gif"> also covers <I>X</I>, we have<P>
<img src="976_n.gif"><P>
<h4><a name="0a21_0002">(37.9)<a name="0a21_0002"></sub></sup></h4><P>
The remainder of the proof rests on the following key inequality, which we shall prove shortly. For any set <I>S</I> belonging to the family <img src="976_o.gif">,<P>
<img src="976_p.gif"><P>
<h4><a name="0a21_0003">(37.10)<a name="0a21_0003"></sub></sup></h4><P>
From inequalities (37.9) and (37.10), it follows that<P>
<img src="976_q.gif"><P>
proving the theorem. It thus remains only to prove inequality (37.10). For any set <img src="977_a.gif">, let<P>
<img src="977_b.gif"><P>
be the number of elements in <I>S</I> remaining uncovered after <I>S</I><SUB>1</SUB>, <I>S</I><SUB>2</SUB>, . . . , <I>S<SUB>i </I></SUB>have been selected by the algorithm. We define <I>u</I><SUB>0</SUB> = |<I>S</I>| to be the number of elements of <I>S</I>, which are all initially uncovered. Let <I>k</I> be the least index such that <I>u<SUB>k</I></SUB> = 0, so that every element in <I>S</I> is covered by at least one of the sets <I>S</I><SUB>1</SUB>, <I>S</I><SUB>2</SUB>, . . . , <I>S<SUB>k</I></SUB>. Then, <I>u<SUB>i-</I>1</SUB> <img src="../images/gteq.gif"> <I>u<SUB>i</I></SUB>, and <I>u<SUB>i-</I>1</SUB> - <I>u<SUB>i</I></SUB> elements of <I>S</I> are covered for the first time by <I>S<SUB>i</I></SUB>, for <I>i</I> = 1, 2, . . . , <I>k</I>. Thus,<P>
<img src="977_c.gif"><P>
Observe that<P>
<img src="977_d.gif"><P>
because the greedy choice of <I>S<SUB>i</I></SUB> guarantees that <I>S</I> cannot cover more new elements than <I>S<SUB>i</I></SUB> does (otherwise, <I>S</I> would have been chosen instead of <I>S<SUB>i</I></SUB>). Consequently, we obtain<P>
<img src="977_e.gif"><P>
For integers <I>a</I> and <I>b</I>, where a < <I>b</I>, we have<P>
<img src="977_f.gif"><P>
Using this inequality, we obtain the telescoping sum<P>
<img src="977_g.gif"><P>
since <I>H</I>(0) = 0. This completes the proof of inequality (37.10). <P>
<a name="0a21_0004">Corollary 37.5<a name="0a21_0004"><P>
<FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> has a ratio bound of (ln |<I>X</I>| + 1).<P>
<I><B>Proof </I></B>Use inequality (3.12) and Theorem 37.4. <P>
In some applications, max <img src="978_a.gif"> is a small constant, and so the solution returned by <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-COVER</FONT> is at most a small constant times larger than optimal. One such application occurs when this heuristic is used to obtain an approximate vertex cover for a graph whose vertices have degree at most 3. In this case, the solution found by <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER </FONT>is not more than <I>H</I>(3) = 11/6 times as large as an optimal solution, a performance guarantee that is slightly better than that of <FONT FACE="Courier New" SIZE=2>APPROX-</FONT><FONT FACE="Courier New" SIZE=2>VERTEX-</FONT><FONT FACE="Courier New" SIZE=2>COVER<B>.</B></FONT><P>
<P>
<h2><a name="0a22_1d45">Exercises<a name="0a22_1d45"></h2><P>
<a name="0a22_1d46">37.3-1<a name="0a22_1d46"><P>
Consider each of the following words as a set of letters: {<FONT FACE="Courier New" SIZE=2>arid</FONT>, <FONT FACE="Courier New" SIZE=2>dash</FONT>, <FONT FACE="Courier New" SIZE=2>drain</FONT>, <FONT FACE="Courier New" SIZE=2>heard</FONT>, <FONT FACE="Courier New" SIZE=2>lost</FONT>, <FONT FACE="Courier New" SIZE=2>nose</FONT>, <FONT FACE="Courier New" SIZE=2>shun</FONT>, <FONT FACE="Courier New" SIZE=2>slate</FONT>, <FONT FACE="Courier New" SIZE=2>snare</FONT>, <FONT FACE="Courier New" SIZE=2>thread</FONT>}. Show which set cover <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> produces when ties are broken in favor of the word that appears first in the dictionary.<P>
<a name="0a22_1d47">37.3-2<a name="0a22_1d47"><P>
<a name="0a22_1d44">Show that the decision version of the set-covering problem is NP-complete by reduction from the vertex-cover problem.<P>
<a name="0a22_1d48">37.3-3<a name="0a22_1d48"><P>
Show how to implement <FONT FACE="Courier New" SIZE=2>GREEDY-</FONT><FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> in such a way that it runs in time <img src="978_b.gif">.<P>
<a name="0a22_1d49">37.3-4<a name="0a22_1d49"><P>
Show that the following weaker form of Theorem 37.4 is trivially true:<P>
<img src="978_c.gif"><P>
<a name="0a22_1d4a">37.3-5<a name="0a22_1d4a"><P>
Create a family of set-cover instances demonstrating that <FONT FACE="Courier New" SIZE=2>GREEDY</FONT>-<FONT FACE="Courier New" SIZE=2>SET-</FONT><FONT FACE="Courier New" SIZE=2>COVER</FONT> can return a number of different solutions that is exponential in the size of the instance. (Different solutions result from ties being broken differently in the choice of <I>S</I> in line 4.)<P>
<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -