Class MinimumBase
public final class MinimumBase extends Object
compute(ImmutableGraph, NodeColouringStrategy, ArcColouringStrategy)
starts from a graph
(possibly with a colouring on its nodes and/or on its arcs) and returns an array, say
a[]
, with exactly as many elements as there are nodes in the graph, and with the
following properties:
a[x]==a[y]
iff the total graph of the universal fibration ofx
andy
is the same;- the values contained in
a[]
range from 0 tok−1
wherek
is the number of distinct total graphs; - the values in
a[]
are assigned canonically, that is, ifb[]
is the array returned by the method on an isomorphic graph (with the same colours, if the graph is coloured) and iff
represents the isomorphism, thena[x]=b[f(x)]
for every nodex
.
Algorithm implementation
The algorithm merges ideas from nauty
's partitioning algorithm (Brendan D. McKay,
Practical Graph Isomorphism, Congressus Numerantium, 30:45−87, 1981) and from
Cardon−Crochemore's partitioning algorithm (A. Cardon, Maxime Crochemore, Partitioning a
Graph in O(|A| log2 |V|). Theor. Comput. Sci.,
19:85−98, 1982). It has been described by Paolo Boldi, Violetta Lonati, Massimo Santini and
Sebastiano Vigna in Graph fibrations, graph isomorphism, and PageRank, RAIRO Inform.
Théor., 40:227−253, 2006.
The algorithm is oriented towards very large web graphs, and as such it has a very sober usage of data structures—we use just n vectors comprising m integers plus six vectors of n integers (where n is the number of nodes and m is the number of arcs) with a theoretical time bound O(n logn + m log(n + m+ c) logn) using c arc colors. In fact, this implementation uses a Quicksort in a few places where a linear radix exchange would be required to obtained the abovementioned bound.
In the following, by partition of a set we mean a subdivision of the set into a number of nonempty disjoint subsets, called parts.
The algorithm execution happens in rounds; at the end of each round, a certain partition of the nodes is established. The starting partition is the one determined by node colours, if any, or it is simply the trivial partition with just one part. At every round, the old partition is refined (i.e., some of the parts are further subdivided into subparts). The algorithm stops as soon as no part is actually subdivided at the end of a round: the final partition is the desired one (i.e., the nodes are partitioned according to their universal-fibration total graph).
Basic data structures
Current partition: The current partition is stored into three arrays: the first,
called part[]
simply contains a permutation of the nodes with the property that
nodes belonging to the same part appear consecutively; the second, called start[]
,
contains, for each index, the first index of part[]
, in the same part. More
formally, suppose that part[begin]
, part[begin+1]
, …,
part[end-1]
is one of the parts; then, start[x]=begin
for all
x
between begin
and end
. Thus, starts[]
is
made of blocks of identical integers, and the first integer of each block is equal to its index
in start[]
.
We also keep track in inv[]
of the inverse of part[]
. Unless otherwise
specified, we shall identify a part with its starting index in the array part[]
.
Finally, we keep track of the number of elements of each part. This would require an additional
array card[]
, which we actually overlap to start[]
by noting that if we
encode the cardinality of the part starting at x
as a negative number in
start[x]
, it is always possible to recover the original value in
start[]
, as it is just x
. This tricky encoding is used in the code, but
in the following for sake of simplicity we shall assume that card[]
is a separate
array.
Active parts: At the beginning of each round, there is a certain set of active
parts; their number is stored in numActiveParts
, and they are stored in the
startActivePart[]
array, in arbitrary order.
Touched nodes: During each round, some of the nodes are deemed as touched; their
number is stored in touchedListLength
, and they are stored in the
touchedList[]
array.
First phase: assigning labels to nodes
The final aim of the first phase is to assign to each node x
a label that is the
list of all nodes y
that have an arc towards x
and appear in some
active parts; such labels (whose length cannot be larger than the indegree of x
)
will be contained in the array inFrom[x][]
, its length being stored in
inFill[x]
.
To obtain this result, the algorithm scans all the nodes in all the active parts, and for each
such node y
considers all outgoing arcs, writing y
in all the labels of
the target nodes of such arcs. When the first node is ever added to inFrom[x][]
, we
also add x
to the list of touched nodes.
Second phase: refining touched parts
We consider all touched nodes. A part containing a touched node is said to be touched, too, but
some of the nodes of a touched part might not have been touched. Our purpose is to partition all
touched parts into subparts: this amounts in permuting the portion of the array
part[]
where the part is stored (and the corresponding entries in
inv[]
), and changing some of the entries of the array start[]
(those
that presently point to the first element of the part).
Note that at a certain point of this phase we have some parts that have already been subpartitioned (we call them completed), a part that is being considered (we call it current) and some other touched parts that will be considered later on.
First of all we order the touched nodes by part index. In this way, the list of touched nodes is made by segments contained in the same part. We now consider in turn each segment; the part containing the segment will be the current part.
Suppose that the current part starts at index begin
and ends at index
end
(exclusive). First of all, for each node x
in the current segment
of the touched nodes, the label inFrom[x][0..inFill[x]-1]
is sorted according to the
following lexicographic order:
- if the colour of the arc (
y
,x
) is smaller than the colour of (y'
,x
), theny
must appear beforey'
; - if the colour of the arc (
y
,x
) is the same as the colour of (y'
,x
), but the part ofy
is smaller than the part ofy'
, theny
must appear beforey'
.
After sorting, we can identify the new parts by scanning the touched nodes of the current segment and comparing their labels. We know in advance where each node belong, as they must be moved after an initial (possibly empty) part of untouched nodes in the same order in which they now appear in the touched list; since we know the size of the original part, and the number of touched nodes in the part, we can compute the number of untouched nodes in the current part and move the touched nodes after them.
Some care must be taken, though: when comparing the parts of y
and y'
we are considering the new partitioning for the completed parts, but we use the old partitioning
for the current part (i.e., nodes in the current part are considered to have partition number
begin
). This guarantees a subpartitioning process quicker than
Cardon−Crochemore's (which uses the old parts for this whole phase), but at the same time
does not incur into the asymptotic loss of nauty
's algorithm (which keeps active
parts in a queue and performs a full round for each part).
-
Method Summary
Modifier and Type Method Description static int[]
compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)
Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.static boolean
equalLabellings(int[] a, int[] b)
static void
main(String[] arg)
-
Method Details
-
compute
public static int[] compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.Note that the labelling is surjective—if a node has label k, there are nodes with label j, for every 0≤j≤k.
- Parameters:
g
- an immutable graph.nodeColouring
- a colouring for the nodes, ornull
.arcColouring
- a colouring for the arcs, ornull
.- Returns:
- an array of integers labelling the graph so that two nodes have the same label iff they are in the same fibre of minimal fibrations.
-
equalLabellings
public static boolean equalLabellings(int[] a, int[] b) -
main
- Throws:
IOException
-