This collection contains Java (≥8) software distributed by the Laboratory for Web Algorithmics (LAW), and it is usually linked to some publication. If you find our software useful while working at a scientific publication, please cite us properly, either using the publications quoted in the documentation, or contacting us for suggestions.

We try to distribute everything under the GNU General Public License or the GNU Lesser General Public License.

Highlights

  • Statistical tools to compute efficiently Kendall's τ and the weighted τ. They include tools to limit accurately the precision of the involved ranks, as the noise caused by approximation can significantly alter the computation of τ (see “Traps and pitfalls of topic-biased PageRank”, by Paolo Boldi, Massimo Santini, Roberto Posenato, and Sebastiano Vigna, in WAW 2006. Fourth Workshop on Algorithms and Models for the Web-Graph, volume 4936 of Lecture Notes in Computer Science, pages 107−116, Springer–Verlag, 2008).
  • The largest publicly available set of classes and documentation related to spectral ranking. It includes a detailed explanation of theoretical formulations and of the algorithms actually implementing them. In particular, PageRankParallelGaussSeidel is our best-of-breed implementation of PageRank, whereas PageRankFromCoefficients makes it possible to compute PageRank and its derivatives for every value of the damping factor using the precomputed coefficients of PageRank's power series (using the results described in “PageRank: Functional dependencies”, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna, ACM Trans. Inf. Sys., 27(4):1−23, 2009). You can also compute, for instance, the dominant eigenvector and Katz's index.
  • A highly scalable implementation of the Layered Label-Propagation algorithm.
  • ConsistentHashFunction implements the consistent hash function used by UbiCrawler.

Packages
Package Description
it.unimi.dsi.law
Basic classes.
it.unimi.dsi.law.big.graph  
it.unimi.dsi.law.big.rank  
it.unimi.dsi.law.big.stat  
it.unimi.dsi.law.big.util  
it.unimi.dsi.law.bubing.util  
it.unimi.dsi.law.fibrations  
it.unimi.dsi.law.graph
Graph-related classes.
it.unimi.dsi.law.io.tool
Tools manipulating and converting files.
it.unimi.dsi.law.rank
Computation of spectral rankings and associated utilities.
it.unimi.dsi.law.stat
Statistical tools (in particular, Kendall's τ) for large-size data.
it.unimi.dsi.law.util
Utility classes.
it.unimi.dsi.law.vector  
it.unimi.dsi.law.warc.filters
A comprehensive filtering system.
it.unimi.dsi.law.warc.filters.parser  
it.unimi.dsi.law.warc.io
Provides classes performing low and high level WARC I/O (for format details, please see the ISO draft).
it.unimi.dsi.law.warc.io.examples  
it.unimi.dsi.law.warc.parser
Extensions of the BulletParser.
it.unimi.dsi.law.warc.tool
Command-line tools that manipulate WARC files.
it.unimi.dsi.law.warc.util  
it.unimi.dsi.law.webgraph