gsh-2015

If you publish results based on this graph, please quote the references suggested in the dataset page.

This graph is a large snapshot of the web taken in 2015 by BUbiNG starting from the site http://europa.eu/ without any domain restriction. The maximum number of pages per host was set to 100 to find a large number of hosts (gsh stands for “general shallow”).

You can also get the graph of hosts and top private domains (explained here).

Basic data
nodes988 490 691
arcs33 877 399 152
bits/link1.606 (6.12%)
bits/link (transpose)1.427 (5.44%)
average degree34.272
maximum indegree58 860 299
maximum outdegree32 114
dangling nodes10.29%
buckets2.31%
largest component792 304 915 (80.15%)
spid0.34 (± 0.016)
average distance12.32 (± 0.017)
reachable pairs80.29% (± 1.086)
median distance13 (59.98%)
harmonic diameter14.91 (± 0.194)
Random access (recommended)
FilenameSize
gsh-2015.graph8.1G
gsh-2015.properties4.0K
gsh-2015-t.graph6.2G
gsh-2015-t.properties4.0K
gsh-2015.map12G
gsh-2015.smap12G
gsh-2015.md5sums4.0K
gsh-2015.lmap30G
gsh-2015.fcl26G
gsh-2015.urls.gz8.5G
gsh-2015.stats4.0K
gsh-2015.indegree113M
gsh-2015.outdegree72K
gsh-2015.scc3.7G
gsh-2015.sccsizes501M
Sequential access (high compression)
FilenameSize
gsh-2015-hc.graph6.4G
gsh-2015-hc.properties4.0K
gsh-2015-hc-t.graph5.7G
gsh-2015-hc-t.properties4.0K
Natural order (random access)
FilenameSize
gsh-2015-nat.graph15G
gsh-2015-nat.properties4.0K
gsh-2015-nat.fcl23G
gsh-2015-nat.urls.gz8.7G
Indegree-frequency plotIndegree-frequency plot (with Fibonacci binning)
Outdegree-frequency plotOutdegree-frequency plot (with Fibonacci binning)
Indegree-rank plot (cumulative)Indegree-rank plot (cumulative)
Outdegree-rank plot (cumulative)Outdegree-rank plot (cumulative)
Distance probability mass functiondistance probability mass function
Connected-components size distributionConnected-components size distribution
Large connected componentsLarge connected components
Distribution of the logarithm of successor gapsDistribution of the logarithm of the successor gaps
Distribution of successor gapsDistribution of successor gaps