it.unimi.dsi.law.fibrations.MinimumBase

public final class MinimumBase
extends Object

Static methods to compute the minimum fibration base of a given graph. More precisely, the method compute(ImmutableGraph, NodeColouringStrategy, ArcColouringStrategy) starts from a graph (possibly with a colouring on its nodes and/or on its arcs) and returns an array, say a[], with exactly as many elements as there are nodes in the graph, and with the following properties:

a[x]==a[y] iff the total graph of the universal fibration of x and y is the same;
the values contained in a[] range from 0 to k−1 where k is the number of distinct total graphs;
the values in a[] are assigned canonically, that is, if b[] is the array returned by the method on an isomorphic graph (with the same colours, if the graph is coloured) and if f represents the isomorphism, then a[x]=b[f(x)] for every node x.

Algorithm implementation

The algorithm merges ideas from nauty's partitioning algorithm (Brendan D. McKay, Practical Graph Isomorphism, Congressus Numerantium, 30:45−87, 1981) and from Cardon−Crochemore's partitioning algorithm (A. Cardon, Maxime Crochemore, Partitioning a Graph in O(|A| log₂ |V|). Theor. Comput. Sci., 19:85−98, 1982). It has been described by Paolo Boldi, Violetta Lonati, Massimo Santini and Sebastiano Vigna in Graph fibrations, graph isomorphism, and PageRank, RAIRO Inform. Théor., 40:227−253, 2006.

The algorithm is oriented towards very large web graphs, and as such it has a very sober usage of data structures—we use just n vectors comprising m integers plus six vectors of n integers (where n is the number of nodes and m is the number of arcs) with a theoretical time bound O(n logn + m log(n + m+ c) logn) using c arc colors. In fact, this implementation uses a Quicksort in a few places where a linear radix exchange would be required to obtained the abovementioned bound.

In the following, by partition of a set we mean a subdivision of the set into a number of nonempty disjoint subsets, called parts.

The algorithm execution happens in rounds; at the end of each round, a certain partition of the nodes is established. The starting partition is the one determined by node colours, if any, or it is simply the trivial partition with just one part. At every round, the old partition is refined (i.e., some of the parts are further subdivided into subparts). The algorithm stops as soon as no part is actually subdivided at the end of a round: the final partition is the desired one (i.e., the nodes are partitioned according to their universal-fibration total graph).

Basic data structures

Current partition: The current partition is stored into three arrays: the first, called part[] simply contains a permutation of the nodes with the property that nodes belonging to the same part appear consecutively; the second, called start[], contains, for each index, the first index of part[], in the same part. More formally, suppose that part[begin], part[begin+1], …, part[end-1] is one of the parts; then, start[x]=begin for all x between begin and end. Thus, starts[] is made of blocks of identical integers, and the first integer of each block is equal to its index in start[].

We also keep track in inv[] of the inverse of part[]. Unless otherwise specified, we shall identify a part with its starting index in the array part[].

Finally, we keep track of the number of elements of each part. This would require an additional array card[], which we actually overlap to start[] by noting that if we encode the cardinality of the part starting at x as a negative number in start[x], it is always possible to recover the original value in start[], as it is just x. This tricky encoding is used in the code, but in the following for sake of simplicity we shall assume that card[] is a separate array.

Active parts: At the beginning of each round, there is a certain set of active parts; their number is stored in numActiveParts, and they are stored in the startActivePart[] array, in arbitrary order.

Touched nodes: During each round, some of the nodes are deemed as touched; their number is stored in touchedListLength, and they are stored in the touchedList[] array.

First phase: assigning labels to nodes

The final aim of the first phase is to assign to each node x a label that is the list of all nodes y that have an arc towards x and appear in some active parts; such labels (whose length cannot be larger than the indegree of x) will be contained in the array inFrom[x][], its length being stored in inFill[x].

To obtain this result, the algorithm scans all the nodes in all the active parts, and for each such node y considers all outgoing arcs, writing y in all the labels of the target nodes of such arcs. When the first node is ever added to inFrom[x][], we also add x to the list of touched nodes.

Second phase: refining touched parts

We consider all touched nodes. A part containing a touched node is said to be touched, too, but some of the nodes of a touched part might not have been touched. Our purpose is to partition all touched parts into subparts: this amounts in permuting the portion of the array part[] where the part is stored (and the corresponding entries in inv[]), and changing some of the entries of the array start[] (those that presently point to the first element of the part).

Note that at a certain point of this phase we have some parts that have already been subpartitioned (we call them completed), a part that is being considered (we call it current) and some other touched parts that will be considered later on.

First of all we order the touched nodes by part index. In this way, the list of touched nodes is made by segments contained in the same part. We now consider in turn each segment; the part containing the segment will be the current part.

Suppose that the current part starts at index begin and ends at index end (exclusive). First of all, for each node x in the current segment of the touched nodes, the label inFrom[x][0..inFill[x]-1] is sorted according to the following lexicographic order:

if the colour of the arc (y,x) is smaller than the colour of (y',x), then y must appear before y';
if the colour of the arc (y,x) is the same as the colour of (y',x), but the part of y is smaller than the part of y', then y must appear before y'.

After sorting, we can identify the new parts by scanning the touched nodes of the current segment and comparing their labels. We know in advance where each node belong, as they must be moved after an initial (possibly empty) part of untouched nodes in the same order in which they now appear in the touched list; since we know the size of the original part, and the number of touched nodes in the part, we can compute the number of untouched nodes in the current part and move the touched nodes after them.

Some care must be taken, though: when comparing the parts of y and y' we are considering the new partitioning for the completed parts, but we use the old partitioning for the current part (i.e., nodes in the current part are considered to have partition number begin). This guarantees a subpartitioning process quicker than Cardon−Crochemore's (which uses the old parts for this whole phase), but at the same time does not incur into the asymptotic loss of nauty's algorithm (which keeps active parts in a queue and performs a full round for each part).

Method Summary

Modifier and Type	Method	Description
`static int[]`	`compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)`	Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.
`static boolean`	`equalLabellings(int[] a, int[] b)`
`static void`	`main(String[] arg)`

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- compute
  
  public static int[] compute(ImmutableGraph g, NodeColouringStrategy nodeColouring, ArcColouringStrategy arcColouring)
  
  Returns a labelling of an immutable graph such that two nodes have the same label iff they are in the same fibre of minimal fibrations.
  Note that the labelling is surjective—if a node has label k, there are nodes with label j, for every 0≤j≤k.
  
  Parameters:
  
  g - an immutable graph.
  
  nodeColouring - a colouring for the nodes, or null.
  
  arcColouring - a colouring for the arcs, or null.
  
  Returns:
  
  an array of integers labelling the graph so that two nodes have the same label iff they are in the same fibre of minimal fibrations.
- equalLabellings
  
  public static boolean equalLabellings(int[] a, int[] b)
- main
  
  public static void main(String[] arg) throws IOException
  
  Throws:
  
  IOException

Class MinimumBase

Algorithm implementation

Basic data structures

First phase: assigning labels to nodes

Second phase: refining touched parts

Method Summary

Methods inherited from class java.lang.Object

Method Details

compute

equalLabellings

main