Category Archives: social networks
3 Comments Posted by Bernhard on September 16th 2013 @ 11:58 am
This should probably go into a funstuff section somewhere, but I used some moments of free time today to upload a script I have written some time ago to github. It’s a very simple piece of code that grabs images tagged with a specified word and, by looking at which tags appear together, creates a co-tag graph file in .gdf format. You can get it from here or run it here. To test how it scales – and to finally know what teens (apparently tumblr’s main audience) dream of – I tried it with 500 sets of 20 images for the tag “dream”. This leads to some 7K distinct tags and after some filtering, that’s what comes out (click image for lager view):
Node size is occurrence count and color (blue => yellow => red) is betweenness centrality. Apparently, love is still a thing out there. Nice.
This may become an actual tool further down the road, but maybe it’s already useful to somebody as is.
EDIT: Try it out here: https://lab.digitalmethods.net/~brieder/tumblr/tagnet/
scrutinizing a network of likes on Facebook (and some thoughts on network analysis and visualization)
7 Comments Posted by Bernhard on July 10th 2013 @ 11:08 am
I have recently added a new feature to the netvizz application: page like networks. This is basically a simple “like crawler” for like relationships between pages on Facebook. It starts with a seed page, gets all the pages liked by it, then gets their likes and so forth. Well, because the feature is new, I’m limiting crawl depth to two, in order to see how many resources are needed. In this post, I’ll quickly go over an example to show what one can do with this, but also to discuss a number of questions related to network analysis and visualization as such.
Network analysis and visualization (NAV) has made quite an entry into social science and humanities research circles over the last couple of years and the hype has contributed to the dominance of the network concept in new media studies and beyond. This dominance has been rightfully criticized and the pretty pictures of points and lines have received their fair share of disparaging commentary. While there are many questions and problems related to NAV, a lot of the criticism I have read or heard is superficial and lacks both understanding of the analytical gestures put forward by NAV and literacy of the diagrams one encounters so frequently now. Concerning the latter point, the main error is to consider the output of network visualization first and foremost as an image; with Barthes, I would suggest to look at them as denotative rather than connotative, as language or code more than image. This means that successful use of a network diagram requires reading skills and knowledge of the production apparatus. In their absence, well, every diagram looks likely the same.
To tease out something truly interesting from a graph – the mathematical representation of a network – a lot is needed and many, many mistakes can be made. But much like statistics, NAV is a powerful tool if handled with care. Let’s consider the following gephi diagram (data available as a .gdf file here, click for larger image):
This is the visualization of a network of 370 pages on Facebook with every node a page and every link an act of “liking”. Keeping with the topic of a recent data-sprint we had with our New Media and Digital Culture MA students about Anti-Islamism, I took the “Stop Islamization of the World” page as starting point and crawled two steps into the network. The result is a quite striking web of pages that clusters – at least according to gephi’s modularity algorithm – quite neatly into four groups. In purple, we find a group of pages (122 nodes) that are explicitly focused on countering Islam; in green – and very well connected to the first group – there is a “defence league” cluster (79 nodes), basically a network of strongly islamophobic street protest groups; in red, we see a group of sites associated with Israel (145 nodes); finally, in turquoise, a much smaller and eccentric group (24 nodes) that could be called “tattoo cluster” dedicated to getting ink done. Because pages do not necessarily reciprocate liking, this is a directed graph, i.e. every link has a source and a target. The curve of the links encodes this direction: a link that bends clockwise in relation to a node is an outgoing link, counter-clockwise is incoming. In this diagram – and in all that follow – node size is a simple count of inlinks.
How does one read something like this? What does it mean? At first glance, a like crawl starting with an islamophobic page results in a large number of pages related to Israel. But what kind of entanglement is this? I think that this question cannot be answered intelligently simply by looking at a single projection of the graph as a diagram. Besides a healthy distrust of the data (why this seed? why not others? how does crawl depth affect the result? are there privacy settings in place? etc.), any non-trivial network needs to be investigated from different angles to even begin understanding its structure. As I have tried to show elsewhere, different layout algorithms flatten the n-dimensional adjacency matrix into two-dimensional diagrams in quite different ways, each bringing particular aspects of the graph structure to the foreground. But there is much more to take into account. In the above diagram, we can easily spot nodes that are bigger than others, meaning that they receive more likes. (side node: it really helps to download all images and flip through them with a decent image viewer – all networks have exactly the same size and layout, only the color changes) Can we conclude that “United with Israel” and the “Isreali Defense Forces” (both 55 inlinks) are the most important actors in this network? And what would “important” then mean? Let’s start with Google’s definition and apply PageRank to our network using a heat scale (blue => yellow => red, click for larger image):
This is quite striking. We start with an Anti-Islam page and end up with the Isreali Defense Forces as the node with the most authority. Now, as I have tried to show recently, PageRank is a complicated beast and far from a simple measure of popularity. Rather, one can think about it as a complex flow of status along links that is highly dependent on topological positioning. Who links is at least as important as the number of links – and because status is passed along, the question of who does not link is crucial. Non-random networks are generally strongly hierarchical and PageRank exploits these asymmetries to the fullest. Let’s investigate further by looking at our network in aggregate form:
Already, a certain disequilibrium becomes visible here: while the Anti-Islam and Defence League clusters are liking back and forth in roughly equal manner, both like pages in the Israel cluster a lot more than they are liked back. But the disequilibrium is certainly not strong enough to simply diagnose a case of non-reciprocated affection. This would have been too easy. To further qualify the graph structure, we need to be able to say more about who links and who does not link. Let’s leave the force-based layout for a moment and look at the network in yet another way (click for larger image):
Here, I have not only arranged nodes on a line, grouped by clusters and ordered by inlink count, but I have also colored links according to their target. This means that we can very well see (on the hi-res image at least) into which cluster individual nodes are linking and even get an aggregate picture of relationships between groups. A nuanced account begins to emerge by looking at the linking practices of the top 10 pages: in the purple anti-islam cluster, page 1,2,4,6,7 and 9 link to the red israel cluster; in the green defence league cluster, 5 and 8 do so as well. But in the Israel cluster, only page 8 and 10 link to the former two. We can thus further qualify the disequilibrium mentioned above: in additional to a mere imbalance in numbers, we can observe a disequilibrium in status; high status nodes from the extremist clusters link to the Israel group, but the latter’s top pages do not like back. This explains why PageRank concentrates on the IDF page: it receives a lot of status, but does not feed it back into the network. If Facebook can stand in for the mapping of complex socio-political relationships – which it probably cannot – we could argue that the “official” Israel is clearly reluctant to associate with islamophobic extremism. But then, why is there a network in the first place? What holds it together?
Let’s start by looking at the most prolific likers in our network. The next diagram (click for larger image) shows the nodes with the highest outlink count:
Here, we see the most active likers, but we also notice that the page with the most likes (“We Stand With Israel – Siotw”) is quite small, which means that other pages do not like it very much. A better way to look at network cohesion in terms of structural positioning is thus to use a measure called betweenness centrality (click for larger image):
Betweenness centrality is often interpreted as close to the notion of bridging capital, i.e. the capacity of an actor to connect different groups. Because betweenness centrality is calculated by looking at the placement of nodes on the shortest paths in a network, it is not simply the heaviest linkers that are being put to the front here. However, some of the heavy linkers remain indeed important and if we take away “We Stand With Israel – Siotw”, a large number of the likes from the Israel cluster to the other two evaporate. The heavy linkers are indeed important for holding the network together.
But we also see the rise of a very interesting node, “Stand for Israel”. While it receives likes from apparently neutral pages such as “Visit Israel”, it is the top Israel cluster page to link into the Defence League cluster, to the “United States Defense League” page to be precise. While “Stand for Israel” announces on their page that “Violent, obscene, profane, hateful, or racist content will be deleted and offenders blocked from the page without notice” (and this indeed seems to be the case), they do like a page that is full of exactly that. That’s playing the role of a broker. In a sense, we can look at like patterns to produce actor descriptions.
What emerges through this still very superficial exploration – I made a point of not looking at the pages themselves as much as possible to focus on a pure NAV approach (which would be quite absurd in an actual research project) – is a set of rather complex relationships between pages that needs to be examined in different ways to even begin to make sense of. The diagrams, here, are not means to communicate findings, but artifacts that become truly salient only by combining, juxtaposing, and narrating them in combination. They are somehow less explanatory than in need of explanation. Let’s look at a final diagram to add yet another perspective (click for larger image):
Here, the heat scale encodes “like_count”, i.e. the number of times a page has been liked by Facebook users, not other pages. Suddenly, the picture flips completely. Albert Einstein and Tattoos lead the pack, but in the middle of the network, two nodes stand out, giving us further clues about how our clusters connect to larger political elements: “Tea Party Patriots” and “Being Conservative”.
Again, I would be very hesitant to make any claims based on the NAV of a set of Facebook pages and how they like each other, in particular in a context as sensitive as this one. Nonetheless, I hope that it becomes clear from this quick example that NAV provides means to investigate a network through multilayered and nuanced explorations of structural patterns that are simply not visible to the naked eye. And this is only a small subset of the many analytical gestures afforded by NAV. In my view, there certainly is an inflation of network diagrams and there are many limits to analyzing phenomena through formalization as points and lines. But much like the case of statistics, the often problematic use of formal techniques should not mean that we have to throw out the baby with the bathwater.
While I am still somewhat of a beginner in NAV, if there is one thing I have learned, it is that we should see network diagrams as specific projections or interpretations of the graph, as slices that interrogate data in particular ways, and that multiple such perspectives are needed to actually produce a picture.
5 Comments Posted by Bernhard on January 3rd 2013 @ 10:01 am
One of the reasons I started to develop the netvizz application, was to get better insights into how Facebook envisions exchange of data and functionality with third party developers. From the beginning, I was quite amazed how much data a third-party app could actually get from the platform – not only about the users that actually install an app, but also about their friends and the groups they are members of. I hope to provide a systematic account of what I’ve learned at some point in the future. But today, I want to discuss a particular element in some more detail, the “read_stream” permission.
To introduce the matter, a couple of points concerning the Facebook APIs as such: every application written by a third-party developer requires a logged in user and this user defines the “scope” of data access the running instance of the application can get – remember that applications are generally used by many users, so the data gleaned from individual scopes can be combined. Applications have to explicitly ask for permission to access certain items and Facebook provides extensive documentation on the permission system, the profile properties, and a set of extended permissions. Users are asked to grant these permissions when they first start an app. This is the permission dialogue for netvizz:
Netvizz currently asks for the following permissions: user_status, user_groups, friends_likes, user_likes, and read_stream. When installing, you cannot refuse individual elements that are not considered “extended permissions”, only decide to not use the app at all. The user_status is actually superfluous and will be removed in the next iteration. The user_groups permission is needed to access group data and both _likes permissions are used for netvizz’ like network functionality.
Now, working on a couple of new features over the last months, I started to get more interested in posts because they have probably become the closest thing to a “carrier of publicness” on the Facebook platform. I was quite amazed how easy it was to extract large numbers of users and (some) of their data from pages – both likes and comments users make on post on or by pages are in principle up for grabs. When doing some housekeeping recently, I noticed that some of the “engagement” metrics netvizz had provided for users’ friends in earlier versions were either broken or outdated and I decided to simply count the number of likes and posts friends make to replace the older metrics. I expected to only be able to read likes – through the friends_likes permission – and public posts. This was indeed true: in the beginning, all I got were public posts. Because I could get much more data through the Graph API Explorer, a developer sandbox that asks for all permissions by default (which can be changed, a great way to explore the permission structure), I discovered the read_stream permission.
The read_stream permission is presented by Facebook in the following way: “Provides access to all the posts in the user’s News Feed and enables your application to perform searches against the user’s News Feed.” It is a so-called “extended permission”, the developer doc noting that “Extended Permissions give access to more sensitive info and the ability to publish and delete data”. And, indeed, when asking for read_stream in netvizz, I suddenly got access to many more posts made by my friends, mostly going from “none” to “a lot”. From what I could gather after some random testing was that I basically got access to all of the activities from my friends that would show up in my newsfeed, without the “top stories” filter. Because many things have the status of “post”, I could get a rather detailed (and timestamped) account of what my friends are doing on the platform. You can check out your own “posts” feed by following this link into the Graph API Explorer. Because comments and likes by users who you are not friends with on posts by somebody you are friends with also show up in your news feed, the read_stream permission allows to capture their activity as well. Facebook seems to be aware of this: because read_stream is an extended permission it gets its own permission dialogue and can actually be skipped:
This is a good thing, but the wording seems a bit sparse: “Posts in your newsfeed” actually translates to “a minute account of your friends’ activities”. Granted, buried in the privacy settings is an option that allows us to modify more generally what information we share with the apps other people use, and these are the default settings:
It’s the “Activities, interests, things I like” option that allows the read_stream permission to work its magic. The people I am friends with on the platform are generally a rather privacy conscious bunch, but I could get the posts from most of them.
This is not a privacy scandal of any sort, measures are in place, but one can still make a couple of points:
- Apps as means for data capture are clearly not discussed enough. For serious data collection, however, going through the API is clearly the way to go and we need to pay more attention to this.
- Again and again: defaults matter. As seen above, the data available to apps used by friends is quite extensive with default settings.
- Again and again: language matters. The read_stream permission dialogue is certainly not explicit enough. Also: why is “app privacy” not in the privacy tab here?
- When we log into a third party site with our Facebook login, we are basically running an app. May be worth pondering what data we are shipping over.
Exploring APIs as important actors in the privacy debate and beyond is crucial. It’s often complicated work, though, and I hope that the developer community can help with that work a bit. It would be highly useful, I think.
12 Comments Posted by Bernhard on October 23rd 2012 @ 9:05 am
Netvizz, a Facebook research app for extracting data from the dominant social networking service, has gained a new feature: page exploration. While the app has been able to get ego-networks and group networks from the start, this is the first time that data for pages can be extracted as well. The Social Network Importer for NodeXL already allows for extracting both co-engagement (users that comment or like the same post are connected) and bipartite networks (both posts and users are in the graph) from Facebook pages but requires you to use NodeXL and Microsoft Office on Windows.
The first implementation of page exploration on netvizz only provides bipartite network files only and yields less data on users, but adds information on the page posts themselves and outputs them both as a graph file and a simple tab-separated text file. For the moment, the app captures a user specified number of posts from the page and loads up to 1000 comments and 1000 likes. It also specifies the type of post in both of the files it generates. This is the (edgeless) network created from the last 100 posts of the New York Times Facebook page:
Users are gray, videos are blue, links are red, photos are yellow and status updates are green. Size is engagement. Because distance from the center indicates stronger engagement from non regular users, one can easily see that both photos and status updates are engaging a different audience than the links and videos.
Visualizing the data from the tsv file, we can explore these kind of relations further. Here, I used Mondrian‘s capacity to show highlights in one chart on all other open charts:
By selecting photos in the barchart, the scatterplot (x: likes, y: comments) shows that photos not only produce much higher engagement scores (the engagement value in both the tsv and gdf files combines numbers of likes, comments, shares, and likes for comments into a single metric) – the median for links is 453, but 1724 for photos – but that there is also a tendency for photos to provoke a comment/like ratio that trends toward the former. This is data from about 10 days of activity, so not suited to make any larger claims – interesting nonetheless.
As already mentioned here, the next step is to produce network files for multiple pages.
2 Comments Posted by Bernhard on October 16th 2012 @ 10:32 am
In my last post, I previewed a feature that I am currently building into netvizz: posts and users that comment and like them are thrown together into a bipartite graph. In this approach, it is easy to combine data from different pages, here from the 30 latest posts of the New York Times and the Wall Street Journal, plotting 27K users (bigger image behind the click):
The app will start spitting out more metrics in the next version, but it’s easy to see from the gephi graph that the NY Times (red) has a bit more users (grey) than the WSJ (blue). There is a bit of overlap in terms of (active) audience, but in general, there seem to be quite distinct populations of the short span the data covers. Interestingly, one post – talking about the space shuttle Endeavor – is a true outlier: it has succeeded in capturing a less “specific” audience.
As this method could be applied to a potentially infinite number of pages, this is really becoming quite problematic in terms of privacy. I have cut the labels for users, but they are in the data. I am unsure about this for the moment, but this feature may not make it in full into the next version.
10 Comments Posted by Bernhard on September 8th 2012 @ 12:36 pm
I am sick this weekend and that’s a justification to stay in bed and play around with the computer a bit. Over these last weeks, I was thinking that it may be interesting to get back to the aging netvizz application and make some direly needed revisions and updates, especially concerning some of the quantitative measures concerning individual users’ activity. Maintenance work is not fun, however, so I decided to add a new feature instead: the bipartite like network.
The idea is pretty simple: instead of graphing friend relationships between users, the new output basically just throws users and likes (liked pages that is – external objects are not available through the API) into the same graph. If a user likes something a link is created. That’s also how Facebook’s opengraph architecture works on the inside. The result – done with gephi – is pretty interesting though (click for bigger image):
The small turquoise dots are users and the bigger red ones liked objects. I eliminated users that did not like anything (or have strong privacy settings), as well as all things liked by a single person only. The data field “likesize” in the output file indicates how often an object has been liked and makes it possible to size likes separately from users (the “type” field distinguishes the two). It is not surprising that, at least in my case, the network of friendship connections and the like network are quite similar. People from Austria do not like the same things as my French friends – although there is a cluster of international stuff in the middle: television shows, music, wikileaks, and so on; these things cannot be clearly attributed to a user group.
One can actually use the same output file for something quite different. The next image shows the same graph but with nodes sized for number of connections (degree). This basically shows the biggest “likers” (anonymized for the purpose of this post) in the network and still keeps the grouped by similar like patterns.
The new feature is already live and can be tried out. If you want to do more than make pretty pictures, I highly recommend checking out the work by my colleagues Carolin Gerlitz and Anne Helmond on what they call the “like economy”.
And now back to bed.
I am currently writing a paper to submit to the new and very exciting journal computational culture on the use of graph theory to produce “evaluative metrics” in contexts like Web search or social networking. One of my core arguments is going to be that the network as descriptive (mathematical) model has never stood in opposition to the notion of hierarchy but should rather be seen as a conceptual tool that was used in different fields (e.g. sociometry, psychometry, citation analysis, etc.) over the 20th century to investigate structure and, in particular, to both investigate and establish hierarchy. This finally gave me an excuse to dive into Jacob L. Moreno’s opus magnum Who Shall Survive? from 1934, which not only founded sociometry but also laid the ground work for social network analysis. This is one of the strangest books I have ever read, not only because the edition from 1978 reveals the author as a deeply Nietzschean character (“Actually, I have written two bibles, an old testament and a new testament.“), but also because the sociogenic therapy Moreno proposes as an approach to the “German-Jewish conflict” puts the whole text in a deeply saddening light. But these aspects only deepen the impression that this is a fascinating book, really one of its kind.
Interestingly, Moreno also discovered what we would now call “power-law dynamics in social networks”. One of the applications of his “sociometric test” – basically a “who do you like” type of questionnaire – in a small American town named Hudson came to the following result:
After the first phase of the sociometric test was given the analysis of the choices revealed that among a population of 435 persons,23 204, or 46.5%, remained unchosen after the 1st choice; 139, or 30%, after the 2d choice; 87, or 20%, after the 3rd choice; 74, or 17%, after the 4th choice; and 66, or 15%, after the 5th choice. (Moreno 1934, p. 249)
This means that 15% of the population was not mentioned when the interviewees were asked which five people in the community they liked best. While this does not make for a particularly skewed distribution, Moreno transposes the result on the population of New York city and adds a quite tantalizing interpretation:
There is no question but that this phenomenon repeats itself throughout the nation, however widely the number of unchosen may vary from 1st to 5th or more choices due to the incalculable influence of sexual, racial, and other psychological currents. For New York, with a population of 7,000,000, the above percentages would be after the 1st choice, 3,200,000 individuals unchosen; after the 2nd choice, 2,100,000 unchosen; after the 3rd choice, 1,400,000 unchosen; after the 4th choice, 1,200,000 unchosen; and after the 5th choice, 1,050,000 unchosen. These calculations suggest that mankind is divided not only into races and nations, religions and states, but into socionomic divisions. There is produced a socionomic hierarchy due to the differences in attraction of particular individuals and groups for other particular individuals and groups. (Moreno 1934, p. 250f)
By looking into the history of the field, I hope to show that the observation of uneven distributions of connectivity in real-world networks, e.g. the work by Hindman and others concerning the Web, are certainly not a discovery of the “new science of networks” of recent years but a virtual constant in mathematical approaches to networks: whenever somebody starts counting, the result is an ordered list, normally with a considerable difference in value between the first and the last element. When it comes to applications of sociometry to sociology or anthropology, the question of leadership, status, influence, etc. is permanently in the forefront, especially from the 1950s onward when matrix algebra starts to allow for quick calculations of different forms of centrality. Contrary to popular myth, when Page and Brin came up with PageRank, they had a very wide variety of inspirational sources to draw from. Networks and ranking had been an old couple for quite a while already.
2 Comments Posted by Bernhard on September 3rd 2011 @ 8:46 am
German publisher Heise Verlag is an international curiosity. It publishes a small number of highly influential computer-related magazines that give a voice to a tech ethos that is at the same time extremely competent in the subject matter (I’ve been a steady subscriber to c’t magazin for over 15 years now, and I am still baffled sometimes just how good it is) and very much aware of the social and political implications of computing (their online magazine Telepolis testifies to that).
Data protection and privacy are long-standing concerns of the heise editors and true to a spirit of society-oriented design, they have introduced a concept as well as a technical implementation of a two-step “like” button. Such buttons, by Facebook or other companies, have of course become a major vector of user-tracking on the Web. By using an iframe, every button loads some code from Facebook’s server and sends the referring url (e.g. http://nytimes.com/articlename/blabla) as an information. The iframe being hosted on the facebook.com domain, cross-site privacy protections can be circumvented, the url information connected to an identifier cookie and, consequently, to a user account. Plugins like the Priv3 project block these mechanisms but a) users have to have a heightened level of awareness to even consider installing something like this and b) the plugin interferes with convenient functions like Google search preferences.
Heise’s suggestion, which they already implemented on their own sites, is simple: websites can download a small bit of code that implements a two-step procedure: the “like” button is greyed out after the page first loads and there is no tracking happening. A first click on the button loads the “real” Facebook code, and the second click provides the usual functionality. The solution is very simple to implement and really a very minor inconvenience. Independently from the debate whether “like” buttons and such add any real value to the Web, this example shows that “social” features like these can be designed in a way that does not necessarily lead to pervasive user tracking.
The echo to this initiative has been very strong (check the Slashdot discussion here), especially in Germany, where privacy (or rather Datenschutz, a concept less centered on the individual but rather on the role of data in society) is an intensely debated issue, due to obvious historical reasons. Facebook apparently threatened to blacklist heise.de at a point, but has since then backpedaled. After all, c’t magazin prints around 600.000 issues of every number and is extremely influential in the German (and Dutch!) computer landscape. I am very curious to see how this story unfolds, because let’s be clear: Facebook’s earning potential is closely tied to its capacity to capture, enrich, and analyze user data.
This initiative – and the Heise ethos in general – underscores that a “respectable” and sober engineering culture does not exclude an explicit normative stance on social and political issues. And is shows that this stance can be translated into technical models, implemented, and shared, both as an idea and as code.
When it comes to analyzing and visualizing data as a graph, we most often select only one unit to represent nodes. When working with social networks, nodes commonly represent user accounts. In a recent post, I used Twitter hashtags instead and established links by looking at which hashtags occurred in the same tweets. But it is very much possible to use different “ontological” units in the same graph. Consider this example from the IPRI project (a click gives you the full map, a 14MB png file):
Here, I decided to mix Twitter user accounts with hashtags. To keep things manageable, I took only the accounts we identified as journalists that posted at least 300 tweets between February 15 and April 15 from the 25K accounts we follow. For every one of those accounts, I queried our database for the 10 hashtags most often tweeted by the user. I then filtered the graph to show only hashtags used by at least two users. I was finally left with 512 user accounts (the turquoise nodes, size is number of tweets) and 535 hashtags (the red nodes, size is frequency of use). Link strength represents the frequency with which a user tweeted a hashtag. What we get, is still a thematic map (libya, the regional elections, and japan being the main topics), but this time, we also see, which users were most strongly attached to these topics.
Mapping heterogeneous units opens up many new ways to explore data. The next step I will try to work out is using mentions and retweets to identify not only the level of interest that certain accounts accord to certain topics (which you can see in the map above), but the level of echo that an account produces in relation to a certain topic. We’ll see how that goes.
In completely unrelated news, I read an interesting piece by Rocky Agrawal on why he blocked tech blogger Robert Scoble from his Google+ account. At the very end, he mentions a little experiment that delicious.com founder Joshua Schachter did a couple of days ago: he asked his 14K followers on Twitter and 1.5K followers on Google+ to respond to a post, getting 30 answers the former and 42 from the latter. Sitting on still largely unexplored bit.ly click data for millions of urls posted on Twitter, I can only confirm that Twitter impact may be overstated by an order of magnitude…
There are many different ways of making sense of large datasets. Using network visualization is one of them. But what is a network? Or rather, which aspects of a dataset do we want to explore as a network? Even social services like Twitter can be graphed in many different ways. Friend/follower connections are an obvious choice, but retweets and mentions can be used as well to establish links between accounts. Hashtag similarity (two users who share a tag are connected, the more they share, the closer) is yet another method. In fact, when we shift from interactions to co-occurrences, many different things become possible. Instead of mapping user accounts, we can, for example, map hashtags: two tags are connected if they appear in the same tweet and the number of co-occurrences defines link strength (or “edge weight”). The Mapping Online Publics project has more ideas on this question, including mapping over time.
In the context of the IPRI research project we have been following 25K Twitter accounts from the French twittersphere. Here is a map (size: occurrence count / color: degree / layout: gephi with OpenOrd) of the hashtag co-occurrences for the 10.000 hashtags used most often between February 15 2011 and April 15 2011 (clicking on the image gets you the full map, 5MB):
The main topics over this period were the regional elections (“cantonales”) and the Arab spring, particularly the events in Libya. The japan earthquake is also very prominent. But you’ll also find smaller events leaving their traces, e.g. star designer Galliano’s antisemitic remarks in a Paris restaurant. Large parts of the map show ongoing topics, cinema, sports, general geekery, and so forth. While not exhaustive, this map is currently helping us to understand which topics are actually “inside” our dataset. This is exploratory data analysis at work: rather than confirming a hypothesis, maps like this can help us get a general understanding of what we’re dealing with and then formulate more precise questions from there.
If an ayatollah tweets in Iran, who hears it? http://t.co/Bz7GSXgA4a
students working hard http://t.co/yPgbq8E2qE
Call for Applications: International M.A. in New Media and Digital Culture at the University of Amsterdam http://t.co/LNT9mIZCCZ
How corporations and spy agencies use "security" to defend profiteering and crush activism http://t.co/utKp5xfetr
- abstract (3)
- actor-network theory (3)
- algorithms (28)
- collective intelligence (2)
- computing (8)
- critique (21)
- database (7)
- economy (8)
- epistemolgy (30)
- facebook (6)
- folksonomy (3)
- mathematics (10)
- metatechnologies (4)
- method (10)
- miscellaneous (3)
- network theory (12)
- ontologies (4)
- paper (2)
- perception (1)
- philosophy (1)
- privacy (5)
- reading technology (1)
- search engines (24)
- social networks (24)
- society oriented design (13)
- software studies (8)
- softwareproject (15)
- statistics (9)
- surveillance (10)
- technological determinism (8)
- tumblr (1)
- Uncategorized (3)
- visualization (11)
- web 2.0 (11)
- September 2013 (2)
- July 2013 (1)
- May 2013 (1)
- January 2013 (1)
- October 2012 (4)
- September 2012 (2)
- July 2012 (1)
- June 2012 (1)
- May 2012 (3)
- April 2012 (1)
- March 2012 (2)
- February 2012 (1)
- January 2012 (1)
- October 2011 (2)
- September 2011 (2)
- August 2011 (1)
- July 2011 (6)
- May 2011 (1)
- April 2011 (3)
- March 2011 (2)
- February 2011 (1)
- December 2010 (1)
- November 2010 (3)
- October 2010 (4)
- September 2010 (3)
- August 2010 (1)
- July 2010 (4)
- April 2010 (1)
- March 2010 (3)
- February 2010 (1)
- August 2009 (1)
- July 2009 (2)
- February 2009 (1)
- December 2008 (2)
- October 2008 (1)
- September 2008 (2)
- August 2008 (1)
- July 2008 (3)
- June 2008 (2)
- May 2008 (2)
- April 2008 (1)
- March 2008 (2)
- February 2008 (1)
- December 2007 (1)
- November 2007 (2)
- October 2007 (5)