This collection contains software distributed by the Laboratory for Web Algorithmics (LAW), and it is usually linked to some publication. If you find our software useful while working at a scientific publication, please cite us properly, either using the publications quoted in the documentation, or contacting us for suggestions.
- Statistical tools to compute efficiently Kendall's τ and the weighted τ. They include tools to limit accurately the precision of the involved ranks, as the noise caused by approximation can significantly alter the computation of τ (see “Traps and pitfalls of topic-biased PageRank”, by Paolo Boldi, Massimo Santini, Roberto Posenato, and Sebastiano Vigna, in WAW 2006. Fourth Workshop on Algorithms and Models for the Web-Graph, volume 4936 of Lecture Notes in Computer Science, pages 107−116, Springer–Verlag, 2008).
- The largest publicly available set of classes and documentation
related to spectral ranking. It includes a detailed
explanation of theoretical formulations and of the algorithms actually
implementing them. In particular,
PageRankParallelGaussSeidelis our best-of-breed implementation of PageRank, whereas
PageRankFromCoefficientsmakes it possible to compute PageRank and its derivatives for every value of the damping factor using the precomputed coefficients of PageRank's power series (using the results described in “PageRank: Functional dependencies”, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna, ACM Trans. Inf. Sys., 27(4):1−23, 2009). You can also compute, for instance, the dominant eigenvector and Katz's index.
- A highly scalable implementation of the Layered Label-Propagation algorithm.
ConsistentHashFunctionimplements the consistent hash function used by UbiCrawler.
The LAW software requires Java ≥6; it uses the DSI utilities, WebGraph, MG4J, and three packages providing high-performance containers and algorithms, that is, fastutil 6.4 or greater, the COLT distribution, and Sux4J. Moreover, it uses JSAP for line-command parsing. The LAW software uses also a number of useful libraries from the Jakarta commons project, including collections, lang, configuration and io. All logging is performed using log4j. Compiling the LAW software requires javacc.
Tools manipulating and converting files.
Computation of spectral rankings and associated utilities.
Statistical tools (in particular, Kendall's τ) for large-size data.
A comprehensive filtering system.
Provides classes performing low and high level WARC I/O (for format details, please see the ISO draft).
Extensions of the
Command-line tools that manipulate WARC files.