📄 vfisomorphism2.aspx.htm

📁 附件中的源代码实现了VF图像的同构(Isomorphism)算法。同时还有一些其他的Wrapper程序
💻 HTM
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
way the lists are sorted. This has a couple of implications. First of
all, it gives us a better shot at matching the correct nodes earlier
rather than later. Secondly, since <code>N1</code> must have at least as many edges as <code>N2</code> (remember that <code>graph2</code> can map into a subgraph of <code>graph1</code>), we can give up as soon as we find a node in the T1 list with degree less than <code>N2</code>
since all nodes thereafter will have still lower degree. Actually, if a
true isomorphism is being sought, then we can stop if the first node of
T1 has a different degree than the first node of T2. There are many
situations where we have an equality failure condition in the case of a
true isomorphism or a &gt;= failure condition in the case of a subgraph
isomorphism. For that reason, we have a delegate in the <code>VfState</code> class, <code>fnComp</code>, which performs one or the other of these tests, and we set it once when the <code>VfState</code> is constructed and then use <code>fnComp</code> throughout the code thereafter for these tests.</p>

<p>Once the <code>CandidateFinder</code> has selected a viable match, we do further checking with <code>FFeasible()</code>. There are essentially four tests that a match must pass to be allowed through <code>FFeasible()</code>.
The first and foremost, simply verifies that the match, when added to
the current isomorphism, doesn't wreck that isomorphism. To be precise,
every node in the isomorphism which is connected to <code>N1</code> must map to a node which is similarly connected to <code>N2</code>.
In addition, if context checking is being done, it is also done here in
the feasibility test, and matches can be rejected if their attributes
conflict. This is really the only absolute requirement. As long as this
is true, as we add nodes, we will always maintain a valid isomorphism
of the nodes in the I sets so that when all nodes of <code>graph2</code>
are in I2, we will have our completed isomorphism. However, using only
this criteria would allow many matches which would have to be manually
backtracked over eventually, so we want to prune our choices here, and
that is what the other three requirements of <code>FFeasible</code> are all about.</p>

<p>The other three requirements are really just versions of the same
rule applied to the T, F, and D groups, so I'll just discuss it for the
T groups, but the same logic applies to the other two groups. I'm still
assuming that we are adding a match between nodes <code>N1</code> of <code>graph1</code> and <code>N2</code> of <code>graph2</code>. We require that the count of In edges from nodes in T1 to <code>N1</code> exceed the count of In edges from nodes in T2 to <code>N2</code>. Ditto for the Out edges. This is another situation where we use <code>fnComp</code>
as described above to do an equality test for true isomorphism or
inequality for subgraph isomorphism. This same test is applied to the F
sets and D sets. The code to do the first test described here (In edges
from T1 nodes compared to In edges from T2 nodes) looks like this:</p>

<div class="no-vmads"><pre lang="cs"><span class="code-keyword">if</span> (!fnCmp(
    GetGroupCountInList(lstIn1, _vfgr1, Groups.FromMapping),
    GetGroupCountInList(lstIn2, _vfgr2, Groups.FromMapping)))
{
    <span class="code-keyword">return</span> <span class="code-keyword">false</span>;
}</pre></div>

<p>Here, <code>GetGroupCountInList()</code> just counts the nodes in the list of the first argument which are in the correct group for the graph given by <code>_vfgr1</code> (<code>graph1</code>) or <code>_vfgr2</code> (<code>graph2</code>). The <code>lstIn1</code> is just the list of nodes which have an edge pointing towards N1, and similarly for <code>lstIn2</code>.</p>

<p>When we add the matching to the isomorphism, this potentially moves several other nodes into other groups. The <code>AddMatchToSolution()</code>
method keeps track of all this movement in the backtrack record so
everything can be moved back if the match fails. The backtrack records
are just linked lists of <code>BacktrackAction</code>s, each of which is either a movement into the isomorphism or a movement from one group to another.</p>

<p>This completes the overview of the algorithm. I'm not including the conversion between <code>Graph</code>s and <code>VfGraph</code>s. The main difference here is to sort the nodes in the order of increasing degree in the <code>VfGraph</code>
and the setup of various lists of nodes in preparation for the VF
algorithm. All the nodes are initially placed in the D group, and the I
group is empty. We also keep the mapping between the unsorted nodes in
the original <code>Graph</code> structure and the nodes in the <code>VfGraph</code> structure so that we can sort it all out in the end and return mappings via <code>Mapping2To1</code> and <code>Mapping1To2</code>, which refer to the node IDs in the original <code>Graph</code> objects rather than the IDs used in the <code>VfGraph</code>s. This is all really more in the matter of bookkeeping than any part of the algorithm proper.</p>

<h2>Performance</h2>

<p>When I started looking at the code, I wanted to find out how many
matches were getting through the candidate finder, how many were
getting through the feasibility checker, and how many made it through
both but were ultimately backtracked over. I got my random graph
creator to create a random graph with 1000 nodes, and found to my
surprise that no nodes were being backtracked - the combination of the
candidate finder and the feasibility checking was acting as a perfect
oracle of what would end up in the final isomorphism. That seemed
impossible, and I suspected a bug. To test this, I commented out the
call to backtrack, and ran all the NUnit code. Two tests failed, which
made me feel much better - my backtrack code was actually useful. After
I started thinking about this, it all began to make much more sense. In
the case of a random graph, there are probably very few nodes which
have maximum degree. Of those with the same maximum degree, in order to
pass the feasibility test, they must have the same number of Out edges
and In edges. More than likely, this uniquely identifies the first
match to be entered into the isomorphism. From there on, as the
isomorphism builds up, the chances that the isomorphism itself is wrong
go rapidly to essentially zero, since to assume otherwise is to assume
that there are two isomorphically identical portions in the random
graph. Ditto for adding new matches, since they must be attached to the
current isomorphism and more than likely can only fit in one place of
the isomorphism. So, we have a good chance of getting the first matches
correct, and after those, the chances go up rather than down.
Consequently, it began to dawn on me that the algorithm is going to
work very well on random graphs. In fact, if we assume no backtracking
and a constant average degree, I believe that the algorithm is O(n^2),
which is great.</p>

<p>Of course, at this point, I wanted to try to find much worse cases.
It seems obvious that it's going to have more backtracks if the graph
is "highly automorphic", by which I mean that large portions of the
graph are isomorphic to other portions. It seemed to me that a really
bad case of this was a grid - nodes at the intersections of the
gridlines, and edges between them along the grid lines. When I tried
this, sure enough, there were lots and lots of backtracks. I realized
that it ought to be worse when there was no isomorphism, so I hooked
two opposite "corners" of the grid together with an edge in one graph
which wasn't present in the second. I had to cancel out of this run
after a long wait, because I think it's pretty close to the worst case.
Since the corners had lower degrees than the middle nodes, they were
going to be examined after all the middle nodes had been checked for
isomorphism, and since they were the only non-isomorphic part, it was
going to try zillions of backups. To verify this, instead of one graph
between these two nodes, I inserted several so their higher degree
caused the algorithm to examine them first. This caused the runtime to
be almost immediate. Of course, I was trying this with a 30x30 grid,
and it probably would have run reasonably well for smaller versions.
One possibility which would have eliminated this particular problem is
to have just done a comparison of node degrees. Since I'm already
sorting nodes by degree, this should be easy. It will also work for
subgraphs where I have to establish a one to one mapping between nodes
in the second graph and nodes with a larger degree in the first graph.</p>

<p>Another potential problem area might be comparing a graph with much
higher degrees with one with much lower degrees. This could eliminate
the advantage of sorting the nodes, tending to bring isomorphic nodes
together in the candidate finder. I don't think it would be too awful a
problem since the original algorithm described in the links at the
start of this article never took advantage of graph ordering at all. Of
course, if the degrees got high enough, it might actually be easier
since nodes will match to other nodes more easily. In the extreme, any
1-1 mapping will do if the first graph is complete.</p>

<p>So, the lesson is that the worst case can be pretty awful, but
smaller graphs or more random graphs are pretty safe to expect a O(n^2)
performance.</p>

<p>I also did some very non-disciplined timing (I didn't try to
eliminate any background processing, for instance), and came up with
the following for the random graphs in my test:</p>

<p><img src="VFIsomorphism2.aspx_files/performance.jpg" height="368" width="545"></p>

<p>Note that the time/node has been multiplied by 4000 so that it
scales nicely on the graph. The edge load factor for the 1000 node
graph was 0.1 so that each node has an average of about 100 In edges
and 100 Out edges. I wanted to hold the degrees the same for all the
graphs, so for the other sized graphs, the edge load factor is
inversely proportional to the count of nodes. Thus, the load factor for
the 2000 node graph was 0.05.</p>

<h2>Points of interest</h2>

<p>A few caveats should be pointed out. First of all, when we speak of
subgraph isomorphism, the definition of subgraph here is a graph
obtained by removing nodes and their attached edges. This means that
you can't remove edges directly in the subgraph unless they are
attached to a removed node. I don't think this is any big problem to
allow for, but haven't incorporated it yet.</p>

<p>The code has NUnit testing included. I like NUnit, and recommend you
get it and the excellent TestDriven.NET Visual Studio add-in. They're
both free and very worthwhile, and I think you'll be happy you did in
the end. However, if you can't bring yourself to take these simple
steps, then you can just eliminate the <code>NUnit.FrameWork</code> reference in the project and take the <code>NUNIT</code>
definition out of the debug configuration. Even in this case, you can
probably understand the code a bit better by looking at the NUnit code.
There is no test or demo program in the code since it's been tested
entirely through NUnit.</p>

<p>One last point - I would assume that the base code runs just fine on
Mono, but I'm not set up to test it. I'm not sure whether NUnit runs
there, so it might have to be removed also.</p>

<h2>History</h2>

<ul>
<li>1-23-08 - First version.</li>

<li>1-24-08 - Algorithm overview added, and (whoops!) forgot originally to make <code lang="cs">FMatch <span class="code-keyword">public</span></code>. Added back in an <code lang="cs">ifdef</code>ed out recursive call, and made a name change or two.</li>

<li>1-30-08 - Corrected the first link, added a section on performance,
and added a quick one time degree compatibility check between graphs.</li>

<li>4-29-08 - Added in the ability to get a list of all isomorphisms rather than the first one found.</li>
</ul>



<!-- Article Ends -->


			<!-- Main Page Contents End -->
			
			</div>
			</span>
			
			<form name="aspnetForm" method="post" action="VFIsomorphism2.aspx" id="aspnetForm" style="margin: 0pt; padding: 0pt;">
		<div>
		<input name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTEwMDUyNjYzMjhkZKxgImHlvj+dY9K9isBcyZ4YxfVB" type="hidden">
		</div>
		

			
			<h2>License</h2>
			<div id="ctl00_LicenseTerms"><p>This article, along with any associated source code and files, is licensed under <a href="http://www.codeproject.com/info/cpol10.aspx">The
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -