Ok, so we can now access information inside nodes and edges, as well as create subgraphs. What else can we do with these objects?
Other than querying the nodes and edges directly, igraph provides a number of functions to explore graphs. igraph is a very large package that offers a lot of functionality, but here we will focus on
library(igraph)
library(igraphdata)
data("UKfaculty")
UKfaculty
## IGRAPH 6f42903 D-W- 81 817 --
## + attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c),
## | Group (v/n), weight (e/n)
## + edges from 6f42903:
## [1] 57->52 76->42 12->69 43->34 28->47 58->51 7->29 40->71 5->37 48->55
## [11] 6->58 21-> 8 28->69 43->21 67->58 65->42 5->67 52->75 37->64 4->36
## [21] 12->49 19->46 37-> 9 74->36 62-> 1 15-> 2 72->49 46->62 2->29 40->12
## [31] 22->29 71->69 4-> 3 37->69 5-> 6 77->13 23->49 52->35 20->14 62->70
## [41] 34->35 76->72 7->42 37->42 51->80 38->45 62->64 36->53 62->77 17->61
## [51] 7->68 46->29 44->53 18->58 12->16 72->42 52->32 58->21 38->17 15->51
## [61] 22-> 7 22->69 5->13 29-> 2 77->12 37->35 18->46 10->71 22->47 20->19
## + ... omitted several edges
If we examine the graph, we know that this is both a directed and weighted graph with the vertex attribute of Group and the edge attribute of weight. Let’s take a look at these attributes.
UKfaculty %>%
V() %>%
.$Group %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.877 3.000 4.000
UKfaculty %>%
E() %>%
.$weight %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 2.000 4.565 6.000 16.000
UKfaculty %>%
V() %>%
.$Group %>%
unique() %>%
sort()
## [1] 1 2 3 4
UKfaculty %>%
E() %>%
.$weight %>%
unique() %>%
sort()
## [1] 1 2 3 4 5 6 7 8 10 12 14 16
finally, let’s make a quick visualization of this graph
set.seed(4321)
plot(UKfaculty,
vertex.label = V(UKfaculty)$Group,
edge.curved = T,
edge.arrow.size = .3)
We will be filtering this graph down into a subgraph to show the power of some of these functions. In order to keep things consistent between the graph and the subgraph, we will need to name the vertices.
V(UKfaculty)$name <- paste('person', V(UKfaculty), sep = '_')
Sometimes we know the edges we want, but don’t know the nodes they are connected to. This often happens when we query the edges for a specific attribute. We can get get this information.
Let’s look at the relationships that have strong weights. Let’s look for the number of quality connections. The above analysis showed the weight’s 3rd quartile is 6. So we will use this as the criteria to define a relationship as strong.
For directed edges, we can find the node that is at the head (what the edge is pointing to) by using the head_of() function. For undirected graphs, the head is simply the second node mentioned in the edge description.
topLiked <- UKfaculty %>%
E() %>%
.[weight >= 6] %>%
head_of(UKfaculty, .)
topLiked %>% sort
## + 265/81 vertices, named, from 6f42903:
## [1] person_1 person_2 person_2 person_2 person_2 person_2 person_2
## [8] person_4 person_4 person_4 person_5 person_5 person_5 person_5
## [15] person_6 person_6 person_7 person_7 person_7 person_7 person_7
## [22] person_7 person_8 person_8 person_9 person_10 person_10 person_10
## [29] person_10 person_10 person_10 person_10 person_10 person_12 person_12
## [36] person_12 person_12 person_12 person_13 person_13 person_13 person_13
## [43] person_13 person_13 person_14 person_14 person_14 person_14 person_16
## [50] person_16 person_16 person_16 person_17 person_18 person_18 person_18
## [57] person_18 person_19 person_19 person_19 person_19 person_20 person_20
## [64] person_20 person_20 person_20 person_21 person_21 person_21 person_21
## + ... omitted several vertices
hist(topLiked,
vcount(UKfaculty),
col = 'lightblue')
topLiked$name %>%
table %>%
sort(T) %>%
head()
## .
## person_31 person_77 person_21 person_29 person_10 person_27
## 12 11 9 9 8 8
Alternatively, we can get the nodes at the tail of a directed edge (where the edge is coming from) by using the tail_of() function. For undirected graphs, the tail is simply the first node mentioned in the edge description.
topLikers <- UKfaculty %>%
E() %>%
.[weight >= 6] %>%
tail_of(UKfaculty, .)
hist(topLikers,
vcount(UKfaculty),
col = 'lightblue')
topLikers$name %>%
table() %>%
sort(T) %>%
head()
## .
## person_29 person_37 person_7 person_21 person_13 person_46
## 19 11 9 8 7 7
You can also get a matrix of vertex ids. The first column represents the tail of the edges and the second column represents the head of the edges.
UKfaculty %>%
E() %>%
.[weight >= 6] %>%
ends(UKfaculty, .) %>%
head()
## [,1] [,2]
## [1,] "person_76" "person_42"
## [2,] "person_28" "person_47"
## [3,] "person_7" "person_29"
## [4,] "person_5" "person_67"
## [5,] "person_12" "person_49"
## [6,] "person_19" "person_46"
Let’s go ahead and identify who the most liked person is and who likes the most people. Remember, we only care about edges that have a weight of greater than or equal to 6.The results would be different if we included all edges.
mostLikedPerson <- topLiked$name %>%
table() %>%
sort(T) %>%
.[1] %>%
names()
mostLikedPerson
## [1] "person_31"
likesTheMost <- topLikers$name %>%
table() %>%
sort(T) %>%
.[1] %>%
names()
likesTheMost
## [1] "person_29"
The ego() function provides all the nodes connected to a node or nodes of interest. The connected nodes can either be directly connected or connected within a certain order of edges. Let’s find all the people that liked the mostLikedPerson and let’s also find all the people the person who likesTheMost likes.
mlp_likedBy <- ego(UKfaculty, 1, mostLikedPerson, 'in')
mlp_likedBy
## [[1]]
## + 20/81 vertices, named, from 6f42903:
## [1] person_31 person_2 person_8 person_15 person_19 person_20 person_21
## [8] person_26 person_29 person_34 person_35 person_37 person_39 person_41
## [15] person_43 person_46 person_51 person_57 person_62 person_79
ltm_likes <- ego(UKfaculty, 1, likesTheMost, 'out')
ltm_likes
## [[1]]
## + 42/81 vertices, named, from 6f42903:
## [1] person_29 person_2 person_4 person_6 person_7 person_8 person_14
## [8] person_15 person_18 person_19 person_20 person_21 person_22 person_23
## [15] person_24 person_26 person_27 person_31 person_32 person_34 person_35
## [22] person_37 person_39 person_41 person_43 person_46 person_50 person_51
## [29] person_52 person_54 person_55 person_57 person_58 person_62 person_64
## [36] person_66 person_69 person_70 person_71 person_77 person_79 person_80
You can enter any number of vertices in the ego() function and the order of the list output will correspond to the order the vertices were entered. Here we will look for all connections (both incomming and outgoing) for each node.
ego(UKfaculty, 1, c(mostLikedPerson, likesTheMost), 'all')
## [[1]]
## + 22/81 vertices, named, from 6f42903:
## [1] person_31 person_2 person_8 person_15 person_19 person_20 person_21
## [8] person_25 person_26 person_29 person_32 person_34 person_35 person_37
## [15] person_39 person_41 person_43 person_46 person_51 person_57 person_62
## [22] person_79
##
## [[2]]
## + 42/81 vertices, named, from 6f42903:
## [1] person_29 person_2 person_4 person_6 person_7 person_8 person_14
## [8] person_15 person_18 person_19 person_20 person_21 person_22 person_23
## [15] person_24 person_26 person_27 person_31 person_32 person_34 person_35
## [22] person_37 person_39 person_41 person_43 person_46 person_50 person_51
## [29] person_52 person_54 person_55 person_57 person_58 person_62 person_64
## [36] person_66 person_69 person_70 person_71 person_77 person_79 person_80
let’s combine these two people together and see their subnetwork
mlp_ltm <- c(mlp_likedBy[[1]], ltm_likes[[1]]) %>%
induced_subgraph(UKfaculty, .)
mlp_ltm
## IGRAPH b9217ff DNW- 42 410 --
## + attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c),
## | Group (v/n), name (v/c), weight (e/n)
## + edges from b9217ff (vertex names):
## [1] person_57->person_52 person_43->person_34 person_58->person_51
## [4] person_7 ->person_29 person_6 ->person_58 person_21->person_8
## [7] person_43->person_21 person_37->person_64 person_19->person_46
## [10] person_15->person_2 person_46->person_62 person_2 ->person_29
## [13] person_22->person_29 person_71->person_69 person_37->person_69
## [16] person_52->person_35 person_20->person_14 person_62->person_70
## [19] person_34->person_35 person_51->person_80 person_62->person_64
## + ... omitted several edges
set.seed(1234)
plot(mlp_ltm,
vertex.color = sapply(V(mlp_ltm)$name, function(x){
if(x %in% c(mostLikedPerson, likesTheMost)){
'lightblue'
} else {
'lightgrey'
}
}),
edge.curved = T,
edge.arrow.size = .4,
vertex.label = V(mlp_ltm)$Group)
induced.subgraphs include all edges between the included nodes, but we’re just interested in the edges of interest. Particularly, we only want to see incoming edges for the mostLikedPerson and we only want to see outgoing edges for the person who likesTheMost. Let’s add a filtering step.
mlp_ltm <- mlp_ltm %>%
E() %>%
.[likesTheMost %->% V(mlp_ltm) | mostLikedPerson %<-% V(mlp_ltm)] %>%
.[weight >= 6] %>%
subgraph.edges(mlp_ltm, .)
mlp_ltm
## IGRAPH 8929de7 DNW- 24 30 --
## + attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c),
## | Group (v/n), name (v/c), weight (e/n)
## + edges from 8929de7 (vertex names):
## [1] person_29->person_2 person_19->person_31 person_29->person_69
## [4] person_29->person_35 person_29->person_54 person_21->person_31
## [7] person_29->person_18 person_29->person_7 person_29->person_21
## [10] person_29->person_51 person_8 ->person_31 person_43->person_31
## [13] person_39->person_31 person_29->person_43 person_29->person_27
## [16] person_41->person_31 person_29->person_34 person_29->person_79
## [19] person_29->person_46 person_79->person_31 person_29->person_31
## + ... omitted several edges
set.seed(1234)
plot(mlp_ltm,
vertex.color = sapply(V(mlp_ltm)$name, function(x){
if(x %in% c(mostLikedPerson, likesTheMost)){
'lightblue'
} else {
'lightgrey'
}
}),
edge.curved = T,
edge.arrow.size = .4,
vertex.label = V(mlp_ltm)$Group)
A degree of a node is the simply the raw number of edges incident to it. You can generate this number by using the degree() function and can distinguish what direction of edge you want to count.
degree(UKfaculty, mode = 'out') %>%
sort(T) %>%
head()
## person_29 person_37 person_62 person_5 person_52 person_43
## 41 36 34 28 27 23
degree(UKfaculty, mode = 'in') %>%
sort(T) %>%
head()
## person_69 person_77 person_54 person_21 person_29 person_2
## 24 24 23 22 21 19
degree(UKfaculty, c(likesTheMost, mostLikedPerson)) %>%
sort(T)
## person_29 person_31
## 62 35
Rather than getting the raw count of incident edges, we can also add the edge attributes stored in them. This is done by using the strength() function. This is valuable, because as we seen earlier, the raw number of edges doesn’t necessarily show the strength of those relationships. Remember that the node with the most incoming edges doesn’t match the person we identified as the person with the most quality connections.
strength(UKfaculty, mode = 'out', weights = E(UKfaculty)$weight) %>%
sort(T) %>%
head()
## person_29 person_37 person_31 person_21 person_5 person_62
## 243 131 112 100 96 96
strength(UKfaculty, mode = 'in', weights = E(UKfaculty)$weight) %>%
sort(T) %>%
head()
## person_29 person_31 person_21 person_77 person_69 person_10
## 136 133 119 119 105 100
strength(UKfaculty, c('person_31', 'person_29', 'person_69'), weights = E(UKfaculty)$weight)
## person_31 person_29 person_69
## 245 379 168
We can get the average weight of edges incident to a vertex by combining the information strength() and degree() functions together.
strength(UKfaculty, mode = 'out') %>%
{ ./degree(UKfaculty, mode = 'out')} %>%
sort(T) %>%
head
## person_18 person_14 person_32 person_76 person_23 person_8
## 14.666667 13.200000 12.000000 9.500000 8.857143 8.000000
strength(UKfaculty, mode = 'in') %>%
{ ./degree(UKfaculty, mode = 'in')} %>%
sort(T) %>%
head
## person_20 person_26 person_56 person_75 person_31 person_73
## 9.428571 9.428571 9.333333 8.142857 7.000000 7.000000
The shortest path between two vertices is the path that uses the least edges. This can be caluculated by using the shortest_paths() function
sp <- shortest_paths(UKfaculty,
likesTheMost,
mostLikedPerson,
'out',
output = 'both')
sp
## $vpath
## $vpath[[1]]
## + 3/81 vertices, named, from 6f42903:
## [1] person_29 person_20 person_31
##
##
## $epath
## $epath[[1]]
## + 2/817 edges from 6f42903 (vertex names):
## [1] person_29->person_20 person_20->person_31
##
##
## $predecessors
## NULL
##
## $inbound_edges
## NULL
UKfaculty %>%
E() %>%
.[[. %in% sp$epath[[1]]]]
## + 2/817 edges from 6f42903 (vertex names):
## tail head tid hid weight
## 370 person_29 person_20 29 20 1
## 816 person_20 person_31 20 31 1
It’s important to note, however, that shortest paths use the weight attribute of edges. It also interpret the weights as euclidian distances - that is, paths with less accumulated weight are prioritized over paths with more accumulated weight. Sometimes, we want to ignore the weight attribute because it doesn’t necessarily make sense in the context of our data.
shortest_paths(UKfaculty,
likesTheMost,
mostLikedPerson,
'out',
output = 'both',
weights = NA) %>%
.$epath %>%
.[[1]] %>%
{E(UKfaculty)[[E(UKfaculty) %in% .]]}
## + 1/817 edge from 6f42903 (vertex names):
## tail head tid hid weight
## 604 person_29 person_31 29 31 16
If you don’t care so much about the actual path and only want to know know the magnitude of the walks, then you can use the distance() function. Again, weight matters.
distances(UKfaculty, c(likesTheMost , mostLikedPerson), c(likesTheMost, mostLikedPerson), 'out')
## person_29 person_31
## person_29 0 2
## person_31 4 0
distances(UKfaculty, c(likesTheMost , mostLikedPerson), c(likesTheMost, mostLikedPerson), 'in')
## person_29 person_31
## person_29 0 4
## person_31 2 0
distances(UKfaculty, c(likesTheMost , mostLikedPerson), c(likesTheMost, mostLikedPerson), 'all', weights = NA)
## person_29 person_31
## person_29 0 1
## person_31 1 0
From the given start vertex, take the given number of steps, choosing an edge from the actual vertex uniformly randomly. If the graph is directed, then the random walk only uses edges that traverse outwards. Random walks are fundamental in both community detection and centrality measurements.
set.seed(1234)
random_walk(UKfaculty, likesTheMost, 3)
## + 3/81 vertices, named, from 6f42903:
## [1] person_29 person_8 person_41
sapply(1:100, function(i){
random_walk(UKfaculty, likesTheMost, 3)[[2]]
}) %>%
table() %>%
sort(T)
## .
## 23 14 15 32 7 22 24 80 8 18 19 20 27 37 46 52 55 66 71 2 21 26 34 35 57
## 7 6 5 5 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2
## 69 77 79 4 6 39 41 50 51 54 58 62 64
## 2 2 2 1 1 1 1 1 1 1 1 1 1
The betweenness value is roughly defined as the number of shortest paths going through a vertex or an edge. A betweenness value of 0 means that no shortest paths go through that vertex - that vertex is not essential for connecting any two vertices. Weights are interpreted as distances and a path with a lower weight will be chosen over a path with a higher weight. The betweenness value of a vertex can be calculated with the betweenness() function.
betweenness(UKfaculty, likesTheMost)
## person_29
## 684.1951
V(UKfaculty)$betweenness <- betweenness(UKfaculty)
V(UKfaculty)[[betweenness == max(betweenness)]]
## + 1/81 vertex, named, from 6f42903:
## Group name betweenness
## 37 1 person_37 1223.084
V(UKfaculty)$unweightedBetweenness <- betweenness(UKfaculty, weights = NA)
V(UKfaculty)[[unweightedBetweenness == max(unweightedBetweenness)]]
## + 1/81 vertex, named, from 6f42903:
## Group name betweenness unweightedBetweenness
## 62 3 person_62 965.596 1098.767
Betweenness can also be calculated for edges by using the edge_betweeness() function:
E(UKfaculty)$betweenness <- edge_betweenness(UKfaculty)
E(UKfaculty)[[betweenness == max(betweenness)]]
## + 1/817 edge from 6f42903 (vertex names):
## tail head tid hid weight betweenness
## 520 person_17 person_62 17 62 1 561.5816
E(UKfaculty)$unweightedBetweenness <- edge_betweenness(UKfaculty, weights = NA)
E(UKfaculty)[[unweightedBetweenness == max(unweightedBetweenness)]]
## + 1/817 edge from 6f42903 (vertex names):
## tail head tid hid weight betweenness unweightedBetweenness
## 520 person_17 person_62 17 62 1 561.5816 342.03
Closeness measures how many steps is required to access every other vertex from a given vertex.
closeness(UKfaculty, mode = 'out') %>%
sort(T) %>%
head
## person_62 person_29 person_37 person_5 person_52 person_34
## 0.004651163 0.004347826 0.004255319 0.004219409 0.004016064 0.003831418
closeness(UKfaculty, mode = 'in') %>%
sort(T) %>%
head
## person_11 person_18 person_10 person_54 person_58 person_7
## 0.002283105 0.002277904 0.002262443 0.002262443 0.002257336 0.002247191
In general, vertices with high eigenvector centralities are those which are connected to many other vertices which are, in turn, connected to many others (and so on).
eigen_centrality(UKfaculty, directed = F)$vector %>%
sort(T) %>%
head()
## person_29 person_31 person_21 person_79 person_35 person_19
## 1.0000000 0.8221039 0.7400463 0.5895395 0.5872218 0.5145590