# Calculating Network Homophily – Part 1

A student in the Networks course recently asked for help calculating homophily scores for the network data she had collected, and I was surprised to find that no command exists in R to calculate network homophily, or the proportion of shared ties among nodes with shared attributes. After doing a little background digging, I now suspect that this is in part due to some disagreement on what constitutes homophily in a social network at all!

Broadly, there are two basic approaches to measuring homophily in a network. The first is the proportion of all edges in a network (ties between actors) which bridge two actors with matching characteristics on some dimension. For example, gender homophily could be calculated for both males and females. Male homophily would be calculated as the total number of edges that exist between two males, expressed as a proportion of the total number of edges involving at least one male. A second approach would start by calculating same gender alters as a proportion of total ego network alters, and then averaging across egos with a given characteristic (e.g., male or female) to get average ego network homophily.

Today, we’re going to focus on how to implement the first approach in R. First, we need to load the necessary library to perform these functions, and install it if it’s not already on our computer:

Now, we’re going to actually build the function and walk through it step by step. See the commenting between lines for an idea of what each one does, and notice that we reference the variables that you would specify when you actually use that function:

As we specify in the first line, there are two things the function requires: the graph object and the name of the vertex attribute you want to use to calculate homophily. This should be a network object created by igraph, and the second should be the character vector of the named vertex attribute in that network.

Let’s test this function out on a random network built in R; set the seed and use the syntax as provided to follow along! First, let’s build the network randomly generate node attributes:

Next, let’s calculate basic homophily of this network based on that group membership:

So, about 20% of the ties in the network are between actors in the same group. What is the exact count of these edges, though, rather than the proportion? Let’s see by specifying prop=F:

But homophily is usually calculated with respect to a particular group, for example, the tendency of minority (or majority) race students to choose students with the same racial background as friends.  So what if we’re just curious about group 2? We can get the racial homophily score calculations for edges involving members of group 2 by specifying a specific attribute value:

So group 2 has lower homophily than the average homophily in the network – interesting! Only 12% of the ties including members of group 2 are with other members of that group, meaning their ties are primarily to members of other groups (not that surprising since this is a random network!)

We can do this with character values, too. Let’s create an attribute named ‘animal’ and see the proportion of ties that flow between members of the same species:

Now we’ve got a simple way to calculate network-level homophily for groups on specified attributes. This works for both numeric and character attributes, and can be applied to any igraph network object! Next post will have a command for the second conception of homophily, the average proportion of same type alters within ego-networks.

This entry was posted in networks, R code on . 