📄 vfisomorphism2.aspx.htm
字号:
<h2>Introduction</h2>
<p>The attached code is an implementation of the VF graph isomorphism algorithm. This algorithm is available at the <a href="http://amalfi.dis.unina.it/graph/db/vflib-2.0/doc/vflib.html" target="_blank">VF Graph Comparing library</a>,
and there are other programs which form a wrapper to call into this
library from, for instance, Python. The same could certainly be done
for C#, but the code here implements the algorithm entirely in C#,
bypassing the C++ library altogether. There were a few reasons for
this. First, this offers the algorithm in an entirely managed
environment. Second, I found the original C++ code to be not so
efficient and not terribly well written. Third, the VF library was
actually coded to do comparisons between various algorithms so you get
those algorithms in tow whenever you use the library, and finally, it
just sounded like fun.</p>
<h2>Background</h2>
<p>The graph isomorphism problem accepts a pair of directed graphs as
an input, and returns true if one graph is just a "relabeling" of the
other. The VF algorithm will also perform a subgraph isomorphism in
which the there is a subgraph of the first graph which is isomorphic to
the second graph. It also allows for retrieving the mapping after a
match has been made. It allows for context checking in which nodes can
carry attributes, and those attributes can be tested against the
corresponding attributes for the isomorphic node, and the matching can
be rejected based on the outcome of such tests. At its simplest, this
allows the user to "color" nodes and only allow nodes of the same color
to be matched.</p>
<p>The VF algorithm essentially builds up the isomorphism one match at
a time, and does an educated backtracking search for the matches. There
are various criteria when determining if the addition of a match is
acceptable. Some of these are merely to boost performance, but the most
important criterion is that the newly added match maintains the
isomorphism of the group gathered thus far. In this way, it's pretty
obvious that when you've handled all the nodes in the second graph,
you've achieved the isomorphism you're looking for.</p>
<p>The details for all this can be found in the paper: <a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1323804" target="_blank">A (sub)graph isomorphism algorithm for matching large graphs</a>.
My implementation follows this paper pretty exactly. About the only
difference is that I use a sorted list for some of the internal sets of
nodes, which means I'm more liable to come onto a viable match earlier.
It also means that I can stop searches earlier than in the original
algorithm.</p>
<h2>Using the code</h2>
<p>The code is implemented as a class library. There are two primary classes required to interface with the code. The first is the <code>Graph</code> class which is used to create/manipulate your graph. The only constructor for <code>Graph</code> is the default one which takes no parameters. Once you have a graph, you can insert nodes using <code>InsertNode()</code> and edges using <code>InsertEdge()</code>. Normally, <code>InsertNode</code>
is called with no parameters, in which case it returns the integer ID
for the inserted node. These integer IDs can then be used in the <code lang="cs">InsertEdge(<span class="code-keyword">int</span> node1, <span class="code-keyword">int</span> node2)</code> call to insert edges between nodes. An alternative form of <code>InsertNode</code> takes a single <code lang="cs"><span class="code-keyword">object</span></code>
parameter which represents the attribute for the inserted node. The
attributes can be arbitrary, or they can represent an implementation of
the <code>IContextCheck</code> interface. The <code>IContextCheck</code> is very simple, and looks like this:</p>
<div class="no-vmads"><pre lang="cs"><span class="code-keyword">interface</span> IContextCheck
{
<span class="code-keyword">bool</span> FCompatible(IContextCheck icc);
}</pre></div>
<p>As shown below, the match can be set up to use these context check
attributes, in which case any two nodes which are examined as a
possible match have the context check called on them. If the context
check returns <code lang="cs"><span class="code-keyword">false</span></code>, then no match is allowed. In this case, all attributes in the graph must implement <code>IContextCheck</code>. Any nodes with no attribute will match with any other node. Attributes can also be attached to edges via the <code lang="cs">InsertEdge(iNode1, iNode2, <span class="code-keyword">object</span> attr)</code>, but they (currently) can't be used to affect the match as the attributes attached to nodes are.</p>
<div class="no-vmads"><pre lang="cs">Graph graph = Graph();
<span class="code-comment">//</span><span class="code-comment"> The first nodes is always ID 0 and the rest</span>
<span class="code-comment">//</span><span class="code-comment"> follow so we have nodes 0, 1, 2 and 3</span>
graph.InsertNodes(<span class="code-digit">4</span>)
graph.InsertEdge(<span class="code-digit">0</span>,<span class="code-digit">1</span>);
graph.InsertEdge(<span class="code-digit">1</span>,<span class="code-digit">2</span>);
graph.InsertEdge(<span class="code-digit">2</span>,<span class="code-digit">3</span>);
graph.InsertEdge(<span class="code-digit">3</span>,<span class="code-digit">0</span>);</pre></div>
<p>There are also methods to delete nodes (<code lang="cs">DeleteNode(<span class="code-keyword">int</span> ID)</code>) and edges (<code lang="cs">DeleteEdge(<span class="code-keyword">int</span> idFrom, idTo)</code>). Please note that we are entering directed edges so that <code>InsertEdge(<span class="code-digit">1</span>,<span class="code-digit">0</span>)</code> is a different edge from the one inserted with <code>InsertEdge(<span class="code-digit">1</span>,<span class="code-digit">0</span>)</code>.</p>
<p>Once you have created a pair of graphs which you wish to check for isomorphism, you can create a <code>VfState</code> object with them using the constructor <code>VfState(Graph graph1, Graph graph2)</code>. The <code>VfState</code> can then be used to do the actual check, as follows:</p>
<div class="no-vmads"><pre lang="cs">...
VfState vfs = <span class="code-keyword">new</span> VfState(graph1, graph2);
<span class="code-keyword">bool</span> fIsomorphic = vfs.FMatch();
<span class="code-keyword">if</span> (fIsomorphic)
{
<span class="code-comment">//</span><span class="code-comment"> mapping[i] is the node in graph2 which corresponds to</span>
<span class="code-comment">//</span><span class="code-comment"> node i in graph1...</span>
<span class="code-keyword">int</span>[] mapping = vfs.Mapping1To2;
...</pre></div>
<p>There is also a <code>Mapping2To1</code> with the mapping from <code>graph2</code> to <code>graph1</code>. In subgraph isomorphism (which is used by default), the algorithm looks for a subgraph of <code>graph1</code> which is isomorphic to <code>graph2</code>. The <code>VfState</code> can take two <code lang="cs"><span class="code-SDKkeyword">boolean</span></code>s as either <code lang="cs">VfState(graph gr1, graph gr2, <span class="code-SDKkeyword">boolean</span> fIso)</code> or <code lang="cs">VfState(graph gr1, graph gr2, <span class="code-SDKkeyword">boolean</span> fIso, <span class="code-SDKkeyword">boolean</span> fContextCheck)</code>. If <code>fIso</code> is <code lang="cs"><span class="code-keyword">true</span></code>, then the algorithm searches for a strict isomorphism. This is more efficient than searching for a subgraph isomorphism. If <code>fContextCheck</code> is <code lang="cs"><span class="code-keyword">true</span></code>, then all node attributes are context checked against the counterparts in the mapping, as outlined above.</p>
<p>Update: In addition to the above two signatures for the <code>VfState</code> constructor, there is now another which takes a third <code lang="cs"><span class="code-SDKkeyword">boolean</span></code>: <code lang="cs">VfState(graph gr1, graph gr2, <span class="code-SDKkeyword">boolean</span> fIso, <span class="code-SDKkeyword">boolean</span> fContextCheck, <span class="code-SDKkeyword">boolean</span> fFindAll)</code>. <code>fFindAll</code> is <code lang="cs"><span class="code-keyword">false</span></code>, by default. If <code lang="cs"><span class="code-keyword">true</span></code>,
then after the match, you can get a list of all mappings of all
isomorphisms. Each isomorphism is represented in the list by a <code>FullMapping</code> structure, which just holds the 1 to 2 and 2 to 1 mappings in two <code lang="cs"><span class="code-keyword">int</span>[]</code> arrays, as in the following example.</p>
<div class="no-vmads"><pre lang="cs">...
VfState vfs = <span class="code-keyword">new</span> VfState(graph1, graph2, <span class="code-keyword">false</span>, <span class="code-keyword">false</span>, <span class="code-keyword">true</span>);
<span class="code-keyword">bool</span> fIsomorphic = vfs.FMatch();
<span class="code-keyword">if</span> (fIsomorphic)
{
<span class="code-comment">//</span><span class="code-comment"> mapping[i] is the node in graph2 which corresponds to</span>
<span class="code-comment">//</span><span class="code-comment"> node i in graph1...</span>
<span class="code-keyword">foreach</span>(FullMapping fm <span class="code-keyword">in</span> vfs.Mappings)
{
<span class="code-keyword">int</span>[] mapping1to2 = fm.arinodMap1To2;
<span class="code-keyword">int</span>[] mapping2to1 = fm.arinodMap2To1;
...</pre></div>
<h2>Algorithm overview</h2>
<p>As mentioned above, the algorithm is, basically, a search over the space of matches for an isomorphism between <code>graph2</code> and some subgraph of <code>graph1</code>. By "match", I mean a mapping from a single node of <code>graph1</code> to a single node of <code>graph2</code>.
Each step in the search algorithm proposes adding a new match to the
isomorphism as developed so far. If we can extend these matches to
include all of <code>graph2</code>, then we're finished. If not, we
have to backtrack and attempt different matches. The top level
algorithm of this search looks like this (since the 4/29 update, this
is really pseudo code, with the stuff for multiple isomorphisms left
out for clarity, but the basic idea is still intact):</p>
<div class="no-vmads"><pre lang="cs"><span class="code-keyword">bool</span> FMatchRecursive()
{
Match mchCur;
CandidateFinder cf;
<span class="code-keyword">if</span> (FCompleteMatch())
{
<span class="code-keyword">return</span> <span class="code-keyword">true</span>;
}
cf = <span class="code-keyword">new</span> CandidateFinder(<span class="code-keyword">this</span>);
<span class="code-keyword">while</span> ((mchCur = cf.NextCandidateMatch()) != <span class="code-keyword">null</span>)
{
<span class="code-keyword">if</span> (FFeasible(mchCur))
{
BacktrackRecord btr = <span class="code-keyword">new</span> BacktrackRecord();
AddMatchToSolution(mchCur, btr);
<span class="code-keyword">if</span> (FMatchRecursive())
{
_fSuccessfulMatch = <span class="code-keyword">true</span>;
<span class="code-keyword">return</span> <span class="code-keyword">true</span>;
}
btr.Backtrack(<span class="code-keyword">this</span>);
}
}
<span class="code-keyword">return</span> <span class="code-keyword">false</span>;
}</pre></div>
<p>I should point out that this is a recursive version which is easier to understand, but <code lang="cs">ifdef</code>ed out in the actual code in favor of a non-recursive version. You can <code lang="cs">ifdef</code>
it back in and verify that it works via the NUnit code, but it may blow
the stack with sufficiently large graphs and will be less efficient.
The method uses a <code>CandidateFinder</code> to locate potential
matches, runs through them, and uses a feasible method to do more
careful checking. This checking includes ensuring that the match
doesn't break the isomorphism we've established thus far. <code>BacktrackRecord</code> allows us to undo the effects of the call to <code>AddMatchToSolution</code>
in case this match doesn't work out. We provisionally add the match to
the current isomorphism, extending its range by one more node in each
graph, and we recursively check to see if we can extend this new
isomorphism to a complete mapping over <code>graph2</code>. If so, we're done and return <code lang="cs"><span class="code-keyword">true</span></code>. If not, we move back to the state before we added the match, and look for other potential matches in the main loop.</p>
<p>The real key functions here are <code>NextCandidateMatch</code> and <code>FFeasible</code>.
These two functions depend very much on a division of the nodes into
four sets on each graph. The first set, which we'll refer to as I1/I2
on graph1/graph2, are the nodes which are participating in the
isomorphism produced at each stage of the algorithm. The second set,
T1/T2 for graph1/graph2, is the set of nodes which have at least one
edge pointing towards a node in I1/I2. The third set, F1/F2 for
graph1/graph2, is the set of nodes which have at least one edge
pointing to them from a node in I1/I2. Note that if a node has both
predecessors and successors in I1/I2, then it is a member of both F1/F2
and T1/T2. Finally, nodes which are disconnected from any node in the
isomorphism are in sets D1/D2.</p>
<p>These groups are maintained by the use of an enum:</p>
<div class="no-vmads"><pre lang="cs">[Flags]
<span class="code-keyword">enum</span> Groups
{
ContainedInMapping = <span class="code-digit">1</span>, <span class="code-comment">//</span><span class="code-comment"> Contained in the mapping</span>
FromMapping = <span class="code-digit">2</span>, <span class="code-comment">//</span><span class="code-comment"> Outside the mapping but pointed to from the mapping</span>
ToMapping = <span class="code-digit">4</span>, <span class="code-comment">//</span><span class="code-comment"> Outside the mapping but points to a node in the mapping</span>
Disconnected = <span class="code-digit">8</span> <span class="code-comment">//</span><span class="code-comment"> Outside the mapping with no links to mapped nodes</span>
}</pre></div>
<p>Each node maintains a field with this enum in it stating which group it belongs to. In addition, the <code>VfState</code> class keeps a list of each of these groups, sorted by degree of each node.</p>
<p>Obviously, if we are to maintain the isomorphism by adding a match between a node <code>N1</code> in <code>graph1</code> and <code>N2</code> in <code>graph2</code>,
certain structural rules must hold, and this is how our search manages
to whittle down the candidates to a select few and weed out incorrect
matches fairly quickly. For instance, the <code>CandidateFinder</code>
ensures that either both nodes come from the T sets, or both from the F
sets, or both from the D sets. The case for the T sets stems from the
observation that if <code>N1</code> points into the current isomorphism, then so must <code>N2</code>.
Similarly for the F and D sets. If there are no nodes in either T1 or
T2, the candidate finder looks to the F sets, and after that, the D
sets. When <code>CandidateFinder</code> searches through a particular
group for matches, it does so by decreasing degree since that is the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -