altavista-2002-nd

If you publish results based on this graph, please quote the references suggested in the dataset page.

This graph is the AltaVista webpage connectivity dataset, version 1.0 from which dangling nodes where removed (not recursively—just one level). The goal was to remove in a sensible way the isolated nodes and the frontier of the crawl, which almost surely was included in the graph—please read our comments on the dataset.

Basic data
nodes653 912 338
arcs4 226 882 364
bits/link4.024 (14.35%)
bits/link (transpose)2.917 (10.40%)
average degree6.464
maximum indegree7 608 477
maximum outdegree2 064
dangling nodes18.60%
buckets1.42%
largest component50 092 626 (7.66%)
Random access (recommended)
FilenameSize
altavista-2002-nd.properties4.0K
altavista-2002-nd-t.properties4.0K
altavista-2002-nd.stats4.0K
altavista-2002-nd.indegree15M
altavista-2002-nd.outdegree8.0K
Sequential access (high compression)
FilenameSize
altavista-2002-nd-hc.properties4.0K
altavista-2002-nd-hc-t.properties4.0K
Natural order (random access)
FilenameSize
altavista-2002-nd-nat.properties4.0K
Indegree-frequency plotIndegree-frequency plot (with Fibonacci binning)
Outdegree-frequency plotOutdegree-frequency plot (with Fibonacci binning)
Indegree-rank plot (cumulative)Indegree-rank plot (cumulative)
Outdegree-rank plot (cumulative)Outdegree-rank plot (cumulative)
Connected-components size distributionConnected-components size distribution
Large connected componentsLarge connected components
Distribution of the logarithm of successor gapsDistribution of the logarithm of the successor gaps
Distribution of successor gapsDistribution of successor gaps