webbase-2001

If you publish results based on this graph, please quote the references suggested in the dataset page.

This graph has been obtained from the 2001 crawl performed by the WebBase crawler. The data provided by WebBase has been filtered to eliminate invalid links and to normalise URLs. The experiments reported in The WebGraph Framework I: Compression Techniques and Codes for the World-Wide Web are based on this graph and on uk-2002. Note that for historical reasons the URLs of this graph are coded in ISO-8859-1.

Basic data
nodes118 142 155
arcs1 019 903 190
bits/link2.784 (11.07%)
bits/link (transpose)2.561 (10.18%)
average degree8.633
maximum indegree816 127
maximum outdegree3 841
dangling nodes23.41%
buckets6.95%
largest component53 891 939 (45.62%)
spid1.72 (± 0.011)
average distance17.19 (± 0.030)
reachable pairs43.90% (± 0.418)
median distance
harmonic diameter37.08 (± 0.324)
Random access (recommended)
FilenameSize
webbase-2001.graph399M
webbase-2001.properties4.0K
webbase-2001-t.graph338M
webbase-2001-t.properties4.0K
webbase-2001.map404M
webbase-2001.smap1.3G
webbase-2001.md5sums4.0K
webbase-2001.lmap3.0G
webbase-2001.fcl2.6G
webbase-2001.urls.gz689M
webbase-2001.stats4.0K
webbase-2001.indegree1.6M
webbase-2001.outdegree12K
webbase-2001.scc451M
webbase-2001.sccsizes157M
Sequential access (high compression)
FilenameSize
webbase-2001-hc.graph339M
webbase-2001-hc.properties4.0K
webbase-2001-hc-t.graph312M
webbase-2001-hc-t.properties4.0K
Natural order (random access)
FilenameSize
webbase-2001-nat.graph455M
webbase-2001-nat.properties4.0K
webbase-2001-nat.fcl2.1G
webbase-2001-nat.urls.gz640M
Indegree-frequency plotIndegree-frequency plot (with Fibonacci binning)
Outdegree-frequency plotOutdegree-frequency plot (with Fibonacci binning)
Indegree-rank plot (cumulative)Indegree-rank plot (cumulative)
Outdegree-rank plot (cumulative)Outdegree-rank plot (cumulative)
Distance probability mass functiondistance probability mass function
Connected-components size distributionConnected-components size distribution
Large connected componentsLarge connected components
Distribution of the logarithm of successor gapsDistribution of the logarithm of the successor gaps
Distribution of successor gapsDistribution of successor gaps