Friends and Hypergraphs: The One With All The Networks

Undoubtedly, Friends is one of my favorite tv series. I guess i am not the only one there, since the hosting of all episodes on netflix created quite a stir in the online community. The story of the show is quite easy to follow: Ross loves Rachel, Ross dates Rachel, Rachel breaks up with Ross, Rachel loves Ross, Ross marries Rachel, Ross divorces Rachel, Ross loves Rachel, Ross and Rachel have a happy end. Oh yeah between all the Rachel Ross dilemmas, Monica marries Chandler, Phoebe sings smelly cat and Joey does all kinds of shenanigans. So the question is, is the "Ross and Rachel" story the most central element of the show?
To answer this question, I am gonna look at a dataset of shared plotlines throughout the whole show. That is, which subset of the six characters appeared together in in a plot during an episode. These plots can range from simply hanging out together in Central Perk to some hanky-panky in the bedroom. We can see a shared plotline as some form of interaction and therefore analyse the show from a network perspective. Great! That is my area of expertise! However, what renders the analysis a bit more complicated is the fact, that plotlines can consist of more than two characters, creating a link with more than two endpoints. So we are not just dealing with a regular network, but with a hypergraph.

Network Visualizations of all Episodes

I spent a lot of time on trying to come up with a visualization, that shows hyperedges in a pretty way. But i failed horribly (That's why I am doing network analysis and not a network drawing, i guess). So i decided to split the hyperedges into regular edges. That means, if there was a plotline consisting of Monica, Chandler and Joey, i created the edges (Monica, Chandler), (Monica, Joey) and (Chandler, Joey). I did that with all the plotlines of each episode and counted the number of times a certain storyline occurred and aggregated these counts for each season. So in the end, i got 10 different networks. The edge width indicate how often two characters shared a plotline during the respective season.

Clicking through these figures, I always start thinking about all the funny scenes of the respective seasons. I think it is time to rewatch it for the 10th time!

Who is the most central character?

Visualizations are fun and stuff but they do not really help us to determine the most central character of the show. Since I deal with Network Centrality day in, day out i have a lot of methods up my sleeves to deal with this problem. However, most of them are not really applicable on hypergraphs. The only measure that can be used quite straight forward is eigenvector centrality. So lets do some mild math:

A simple network is usually represented in an adjacency matrix $A$ where $A_{ij}=1$ if there is a link between actor $i$ and actor $j$ and $A_{ij}=0$ otherwise. Since in our case, edges have no directions. $A_{ij}=A_{ji}$ and therefore $A$ is a symmetric matrix. Thanks to Perron-Frobenius, we know that there is a real eigenvalue $\lambda$ which is bigger than every other eigenvalue of $A$. For this eigenvalue, the following equation holds
$$ Av=\lambda v$$
The entries of the vector $v$ are then used to rank the actors of the network. But how can we interpret $v$? The short and simple (and slightly wrong) explanation is, that actors are considered important, if they are connected to other important actors. So it is not just the number of connections, but also the quality of these connections.
When we deal with hypergraphs, we are faced with the problem, that we can no longer represent our network with an adjacency matrix since links can have more than two endpoints. Instead, I will use the so called incidence matrix $E$. The incidence matrix has as many rows as the network has links and as many columns as actors are present. So $E_{ij}=1$ if actor $j$ takes part in edge $i$.

In order to use the eigenvector centrality concept on $E$, it first has to be projected to a square matrix in the actor space. This is done by multiplying $E$ with its transposed $E^T$, i.e. we have the equation
$$E^TEv=\lambda v.$$

The interpretation of $v$ is the same as before and so it should reflect the importance of the characters. But before looking at the show as a whole, I will show the importance rankings of the characters in each season. Or in other words: Who were the most central characters in season 1 to 10? Lets take a look at the seasonwise entries of the vector $v$ and its induced ranking

Original size can be found here

I think the values and rankings reflect the storyline of the seasons quite well. For example season 1 mostly deals with Rachel becoming more independent and the whole Ross and Rachel thing. Season 4 to 6 mainly deal with the relationship of Monica and Chandler, therefore, they are should be the most central characters during this period. Notable is also the position of Phoebe. During the whole show the story never really focuses on here, such that here position within each season ranking is always quite low.

Now lets consider all interactions of all episodes at once. That is, we want to know who is the most central character in the show. And it is...

...CHANDLER! That was kind of surprising to me! But even more surprising is the low overall rank of Ross. Shouldn't he be at least as central as Rachel, since the whole show is about the relationship of Ross and Rachel?

Of course one could question my relatively simple approach on finding the central characters and of course one could question the dataset. But then again, this is a blog about mildly scientific topics, so...yeah... take the results as they are but do not over interpret them. Also, because i am going to show in an upcoming post, why the results are as they are.

Acknowledgement

A big thank you goes to Alex Albright who not only provided the dataset but also some valuable discussions which actually motivated me to write this blog entry. Please check out her blog too!

Labels: Network Analysis, TV Show