An Analysis of the Digg Twitter network using NodeXL

From CMSC734_09
Revision as of 08:12, 4 November 2009 by Mkegan (talk | contribs) ('''Digg Twitter user does not follow its CEO, and vice versa''')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

By John Locke and Melissa Egan, graduate students at University of Maryland, College Park


To experiment with using the NodeXL network analysis tool, we decided to use the tool's built-in Twitter import functionality to analyze an interesting "following" network in Twitter. In most cases, importing a network proved to be impossible due to rate limiting imposed by Twitter. Most sizable and interesting networks would contain at least one user who followed tens of thousands of users, resulting in a hasty ban from Twitter. Eventually, we discovered that the network starting with the digg user was sizable, yet manageable enough to import in its entirety with a level of 2. In this case, the digg user is the Twitter face of the social news website of the same name.


Digg Twitter user does not follow its CEO, and vice versa


Figure 1: A Twitter network consisting of the digg user, the users it follows, and the single user followed by the greatest number of these users. This single user turns out out to be the Digg CEO.

An interesting and immediately apparent observation using NodeXL is that two nodes are very central to the network, but are not related to each other directly - this was one observation gained by using a level of 2 during the Twitter import. This fact was discovered by identifying the "core" digg users, or those users followed by the digg user. We then were able to isolate central users who are followed by these core users. In the case of jayadelson, who we later identified as the CEO of Digg, we noticed that he is not a follower of the actual digg user, and digg does not follow him.

Two users followed by Digg do not follow Digg back


Figure 2: A Twitter network consisting of the digg user and the users it follows. All but two of these users follow digg in return. The two who do not are pictured.

As mentioned in Headline 1, there appears to be a core group of users followed by digg. However, two core users, jeffrey and marianopeterson, do not follow the digg user in return. As shown in the image above, given the relatively small and organized nature of the network we observed, these outliers are surprising. Jeffrey turned out to be Jeffrey Kalmikoff, the "Chief Creative Officer of the Chicago-based, community-business-centric skinnyCorp", while marioanopeterson appears to be a typical Twitter user (who is also registered on Digg). In any event, the ability of NodeXL to identify these individuals was enlightening.

Core Digg users play favorites


Figure 3: A Twitter network consisting of the core digg users (those users followed by digg), and the users that they, in turn, follow (color-coded by size of following from core digg users).

By making use of NodeXL's "Graph Metrics" feature, we were able to identify users who had major followings from within the digg core user group. While browsing these users, we discovered several additional Digg employees (dtrinh, ryan000, nicolegregory, phatduckk, ...) as well as popular figures who tend to have many followers in general (algore, BarackObama, Schwarzenegger, ...). We also observed that many of the most-followed users, even if not Digg employees, were from the San Francisco area, the location of the Digg headquarters.

Other Thoughts & Critique

While we were experimenting with the Digg network using NodeXL, we also noted that our observations could be applied to a wide range of situations in an advantageous manner. For example, in our first headline, we manage to single out the CEO of Digg by merely examining the network of peers on the site's Twitter network. Perhaps this functionality of NodeXL could be used in, say, a missing persons case or other law enforcement situation. Persons of interest may be able to be isolated and investigated based on such social network findings.

In general, it was enlightening to use the NodeXL tool, and to be able to make findings based on an otherwise overwhelming "hairball" of thousands of nodes. Most of the difficulty in this assignment stemmed from the setup. The Office 2007 restriction turned out to be quite a burden for both of us, both in terms of vendor lock-in and platform dependence. The tool itself proved to be incredibly powerful, and we believe it could be just as powerful as a standalone application with minimal spreadsheet functionality as opposed to the entire Office 2007 package.

Further, we were impressed by the import options of NodeXL, which we used for our Twitter import. We had many issues importing networks, but these were largely due to Twitter's restrictions, and not those of NodeXL. However, since many more people network with others using Facebook rather than Twitter, it might be useful to provide a means to analyze a Facebook network.

We did have some initial difficulty running NodeXL from Parallels on Mac OS X. A solution was found in the NodeXL discussion forums.