public final class MinimumBase extends Object
compute(ImmutableGraph, NodeColouringStrategy, ArcColouringStrategy)starts from a graph (possibly with a colouring on its nodes and/or on its arcs) and returns an array, say
a, with exactly as many elements as there are nodes in the graph, and with the following properties:
a[x]==a[y]iff the total graph of the universal fibration of
yis the same;
- the values contained in
arange from 0 to
kis the number of distinct total graphs;
- the values in
aare assigned canonically, that is, if
bis the array returned by the method on an isomorphic graph (with the same colours, if the graph is coloured) and if
frepresents the isomorphism, then
a[x]=b[f(x)]for every node
The algorithm merges ideas from
nauty's partitioning algorithm (Brendan D. McKay,
Practical Graph Isomorphism, Congressus Numerantium, 30:45−87, 1981) and from
Cardon−Crochemore's partitioning algorithm (A. Cardon, Maxime Crochemore, Partitioning a
Graph in O(|A| log2 |V|). Theor. Comput. Sci.,
19:85−98, 1982). It has been described by Paolo Boldi, Violetta Lonati, Massimo Santini and
Sebastiano Vigna in Graph fibrations, graph isomorphism, and PageRank, RAIRO Inform.
Théor., 40:227−253, 2006.
The algorithm is oriented towards very large web graphs, and as such it has a very sober usage of data structures—we use just n vectors comprising m integers plus six vectors of n integers (where n is the number of nodes and m is the number of arcs) with a theoretical time bound O(n logn + m log(n + m+ c) logn) using c arc colors. In fact, this implementation uses a Quicksort in a few places where a linear radix exchange would be required to obtained the abovementioned bound.
In the following, by partition of a set we mean a subdivision of the set into a number of nonempty disjoint subsets, called parts.
The algorithm execution happens in rounds; at the end of each round, a certain partition of the nodes is established. The starting partition is the one determined by node colours, if any, or it is simply the trivial partition with just one part. At every round, the old partition is refined (i.e., some of the parts are further subdivided into subparts). The algorithm stops as soon as no part is actually subdivided at the end of a round: the final partition is the desired one (i.e., the nodes are partitioned according to their universal-fibration total graph).
Basic data structures
Current partition: The current partition is stored into three arrays: the first,
part simply contains a permutation of the nodes with the property that
nodes belonging to the same part appear consecutively; the second, called
contains, for each index, the first index of
part, in the same part. More
formally, suppose that
part[end-1] is one of the parts; then,
start[x]=begin for all
made of blocks of identical integers, and the first integer of each block is equal to its index
We also keep track in
inv of the inverse of
part. Unless otherwise
specified, we shall identify a part with its starting index in the array
Finally, we keep track of the number of elements of each part. This would require an additional
card, which we actually overlap to
start by noting that if we
encode the cardinality of the part starting at
x as a negative number in
start[x], it is always possible to recover the original value in
start, as it is just
x. This tricky encoding is used in the code, but
in the following for sake of simplicity we shall assume that
card is a separate
Active parts: At the beginning of each round, there is a certain set of active
parts; their number is stored in
numActiveParts, and they are stored in the
startActivePart array, in arbitrary order.
Touched nodes: During each round, some of the nodes are deemed as touched; their
number is stored in
touchedListLength, and they are stored in the
First phase: assigning labels to nodes
The final aim of the first phase is to assign to each node
x a label that is the
list of all nodes
y that have an arc towards
x and appear in some
active parts; such labels (whose length cannot be larger than the indegree of
will be contained in the array
inFrom[x], its length being stored in
To obtain this result, the algorithm scans all the nodes in all the active parts, and for each
y considers all outgoing arcs, writing
y in all the labels of
the target nodes of such arcs. When the first node is ever added to
x to the list of touched nodes.
Second phase: refining touched parts
We consider all touched nodes. A part containing a touched node is said to be touched, too, but
some of the nodes of a touched part might not have been touched. Our purpose is to partition all
touched parts into subparts: this amounts in permuting the portion of the array
part where the part is stored (and the corresponding entries in
inv), and changing some of the entries of the array
that presently point to the first element of the part).
Note that at a certain point of this phase we have some parts that have already been subpartitioned (we call them completed), a part that is being considered (we call it current) and some other touched parts that will be considered later on.
First of all we order the touched nodes by part index. In this way, the list of touched nodes is made by segments contained in the same part. We now consider in turn each segment; the part containing the segment will be the current part.
Suppose that the current part starts at index
begin and ends at index
end (exclusive). First of all, for each node
x in the current segment
of the touched nodes, the label
inFrom[x][0..inFill[x]-1] is sorted according to the
following lexicographic order:
- if the colour of the arc (
x) is smaller than the colour of (
ymust appear before
- if the colour of the arc (
x) is the same as the colour of (
x), but the part of
yis smaller than the part of
ymust appear before
After sorting, we can identify the new parts by scanning the touched nodes of the current segment and comparing their labels. We know in advance where each node belong, as they must be moved after an initial (possibly empty) part of untouched nodes in the same order in which they now appear in the touched list; since we know the size of the original part, and the number of touched nodes in the part, we can compute the number of untouched nodes in the current part and move the touched nodes after them.
Some care must be taken, though: when comparing the parts of
we are considering the new partitioning for the completed parts, but we use the old partitioning
for the current part (i.e., nodes in the current part are considered to have partition number
begin). This guarantees a subpartitioning process quicker than
Cardon−Crochemore's (which uses the old parts for this whole phase), but at the same time
does not incur into the asymptotic loss of
nauty's algorithm (which keeps active
parts in a queue and performs a full round for each part).
Modifier and Type Method Description
compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.
equalLabellings(int a, int b)
computepublic static int compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.
Note that the labelling is surjective—if a node has label k, there are nodes with label j, for every 0≤j≤k.
g- an immutable graph.
nodeColouring- a colouring for the nodes, or
arcColouring- a colouring for the arcs, or
- an array of integers labelling the graph so that two nodes have the same label iff they are in the same fibre of minimal fibrations.
equalLabellingspublic static boolean equalLabellings(int a, int b)
mainpublic static void main(String arg) throws IOException