When it comes to social media, YouTube is maybe the most understudied platform considering it’s enormous popularity in the context of popular culture, politics, and commerce. As part of a long term project on APIs from a software/platform studies perspective, but also in relation to the technical fieldwork required for data-driven empirical work, I have been testing the interfaces of quite a number of services now. To make this investigation productive beyond conceptual reflection, I’ve been building digital methods research tools for every system I look at. Nothing beats getting your hands dirty.

Since Google closed its search API some years ago, I haven’t really had a look at their services, but when a student of mine, Anouk Brouwer, started a thesis project on the booktube community on YouTube, I was not only fascinated by the booktube phenomenon and similar practices, but eager to revisit some older scripts and the new Data API v3 to see what kind of analyses would be possible. Google now has a centralized credential system for most of their APIs and a new quota framework where different calls cost different amounts of points. This sounds complicated, but since the quotas are extremely high (50M points/day, 3K calls/second), this is basically API dream land. After banging my head against Facebook’s technical and legal bureaucracy, it’s been extremely rewarding to work with a system that can take much, much more than I’m able to throw at it.

The outcome of this is a new set of scripts, called YouTube Data Tools (YTDT). You can try them out directly online or get the source code. For the moment, there are five modules that focus on different sections of the platform. The different features are explained in the tool interface, but I wanted to share a small experiment, made with the Channel Network module. This module starts from a set of channel ids and then crawls into the network constituted by YouTube’s featured channel feature (channels can “feature” other channels, basically just linking to them from their “channels” tab). The following image, made with gephi, shows a network of nearly 40k channels retrieved by starting with a single seed (the Vsauce channel) and crawling 7 steps into the network (click on the image for a much larger version, a PDF file is also available, as is the data):

youtube channel map

Since a number of channels do not make their view count available, node size and color encode the number of subscribers. I’ve deleted the labels for channels with fewer than 100k subscribers for better readability and used OpenOrd for spatialization. The network is strongly clustered, in particular around practices (gaming, fashion & makeup, etc.), languages, and corporate affiliations (e.g. the Vevo and Disney empires). I wasn’t entirely aware just how many people like to watch other people play games. YouTube is obviously much bigger than this, but the map should show a sizeable portion of the upper echelons of the YouTube hierarchy.

YTDT allows for many other kinds of analysis, and I am planning to introduce them in an overview video in the hopefully not too far future. This is still an early version, but maybe already useful to some people out there.

EDIT (13/05/2015): I made an introductory video:

It’s just a quick overview, but hopefully useful as a starting point.

Post filed under method, social networks, softwareproject, visualization, youtube.

5 Comments

  1. Pingback: Ferramentas para Análise de Redes e Grafos em Mídias Sociais | Blog do IBPAD

  2. I was wondering how Youtube creates the connections between different genres of videos. I have been playing around with Gephi and your Youtube tools with Music Videos. Each video creates a different looking graph and the nodes that connect all of the videos together come from different categories. Ex. GRiZ connects with a lot of Minecraft, Future, Dogs, Highschool videos, etc. Whereas Martin Garrixx Animals connects with nothing that GRiZ connected with. What makes certain videos connect with other videos? It definitely has a correlation to the view count of a video.

    • Yeah that’s the question! Google use to have a page explaining the basics, but it disappeared last year. Here’s a copy: https://web.archive.org/web/20150329041618/https://support.google.com/youtube/answer/6060859?hl=en&ref_topic=6046759 The idea is pretty interesing: “watched together in similar sessions” – pretty straigforward collaborative filtering. What I have observed is that this can lead to sort of hijacking: if youtuber A talks nasty things about youtuber B in a video, the people who watch youtuber A’s video may also go watch the videos of youtuber B. and the video from A may appear in the related videos for B. Look at the recommendations for explicitly feminist videos: they mostly contain links to antifeminist videos. Because the antifeminists rant about feminist youtubers like it’s their job (some actually do make a living), the people who watch their clips also go watch the feminist clips. And that’s how the antifeminist videos end up in the recommended lists on feminist ones. It’s quite interesting and weird.

  3. Pingback: Ferramentas para Análise de Redes e Grafos em Mídias Sociais | IBPAD

  4. Pingback: Ever wondered what a map of YouTube would look like? – Zone 5

Leave a Reply

Your email address will not be published. Required fields are marked *


Tech support questions will not be answered. Please refer to the FAQ of the tool.