Tools manipulating and converting files.
Computation of spectral rankings and associated utilities.
Statistical tools (in particular, Kendall's τ) for large-size data.
A comprehensive filtering system.
Provides classes performing low and high level WARC I/O (for format details, please see the ISO draft).
Extensions of the
Command-line tools that manipulate WARC files.
This collection contains software distributed by the Laboratory for Web Algorithmics (LAW), and it is usually linked to some publication. If you find our software useful while working at a scientific publication, please cite us properly, either using the publications quoted in the documentation, or contacting us for suggestions.
PageRankParallelGaussSeidelis our best-of-breed implementation of PageRank, whereas
PageRankFromCoefficientsmakes it possible to compute PageRank and its derivatives for every value of the damping factor using the precomputed coefficients of PageRank's power series (using the results described in “PageRank: Functional dependencies”, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna, ACM Trans. Inf. Sys., 27(4):1−23, 2009). You can also compute, for instance, the dominant eigenvector and Katz's index.
ConsistentHashFunctionimplements the consistent hash function used by UbiCrawler.
The LAW software requires Java ≥6; it uses the DSI utilities, WebGraph, MG4J, and three packages providing high-performance containers and algorithms, that is, fastutil 6.4 or greater, the COLT distribution, and Sux4J. Moreover, it uses JSAP for line-command parsing. The LAW software uses also a number of useful libraries from the Jakarta commons project, including collections, lang, configuration and io. All logging is performed using log4j. Compiling the LAW software requires javacc.