twitter-2010
If you publish results based on this graph, please quote the references suggested in the dataset page.
Twitter is a website, owned and operated by Twitter Inc., which offers a social networking and microblogging service, enabling its users to send and read messages called tweets. Tweets are text-based posts of up to 140 characters displayed on the user's profile page.
This is a crawl presented by Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon in “What is Twitter, a Social Network or a News Media?”, Proceedings of the 19th International World Wide Web (WWW) Conference, pages 591−600, 2010, ACM press. Nodes are users and there is an arc from x to y if y is a follower of x. In other words, arcs follow the direction of tweet transmission.
Note that the distance distribution reported in the paper (Figure 4) is quite different from the one we computed (with relative standard error ≤0.4% on all points of the cumulative distribution function). Correspondingly, the average distance reported in the paper (4.12) is quite different from our estimate. We have verified that we obtain the distribution reported here even when using breadth-first sampling.
The node numbering of the original graph was not compact, as the node identifiers were Twitter's actual internal identifiers. We thus renumbered nodes in a contiguous way. The original identifiers can be found in the file with extension .ids. Thus, you can access the Twitter page associated to a node by getting the corresponding identifier and then using the Twitter API. For instance, the node of maximum outdegree is node 2997469, corresponding to identifier 19058681, whose owner can be found at http://api.twitter.com/1/users/show.xml?user_id=19058681.