This document will focus on understanding an igraph graph and it’s two main components - nodes and edges. We can also use these components to filter the graph down to a subgraph.
First, we need to load the igraph package
library(igraph)
We now need to create a graph to explore. There are many different ways we can create a new graph, but here we will only cover three of those ways:
make_empty_graph()vertices()edges()graph_from_adjacency_matrix()graph_from_data_frame()When we build a graph up from an empty graph we need to first initiate the vertices so we can refer to them when we build the edges. It is important to note:
directed = F then the order within the pair doesn’t matterdirected = T then the first vertex in the pair is the source and the second is the targetg <- make_empty_graph(directed = T) +
vertices(c('a', 'b', 'c', 'd', 'e'),
type = "letter",
order = c(1, 2, 3, 4, 5)) +
edges(c('a', 'b',
'b', 'c',
'b', 'd',
'c', 'b',
'c', 'e'),
type = "is connected to",
weight = c(1, 1, 2, 1, 2))
Let’s take a look at how igraph represents a graph in R
g
## IGRAPH 8bbdecc DNWB 5 5 --
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from 8bbdecc (vertex names):
## [1] a->b b->c b->d c->b c->e
igraph also provides a plot() to explore the graph visually
set.seed(1234)
plot(g)
The rows and columns of a matrix represent the vertices of a graph. If the element is not 0, then it represents a link between two vertices. It is important to note:
weighted = F then it represents the number of links between the two verticesweighted = T then it represents the weight of the edgeweighted = a character constant, the the number becomes an edge attribute named after the constantmode = "directed" then the rows represent the source nodes and the columns represent the target nodesmode = "undirected" then only one element is chosen from either matrix[i][j] or matrix[j][i]src <- c('a', 'b', 'b', 'c', 'c')
tgt <- c('b', 'c', 'd', 'b', 'e')
wgt <- c(1, 1, 2, 1, 2)
matData <- matrix(0, nrow = 5, ncol = 5, dimnames = list(letters[1:5], letters[1:5]))
sapply(seq_along(src), function(i){
matData[src[i], tgt[i]] <<- wgt[i]
})
## [1] 1 1 2 1 2
matData
## a b c d e
## a 0 1 0 0 0
## b 0 0 1 2 0
## c 0 1 0 0 2
## d 0 0 0 0 0
## e 0 0 0 0 0
g <- graph_from_adjacency_matrix(matData, mode = 'directed', weighted = 'weight')
g
## IGRAPH 07152a3 DNW- 5 5 --
## + attr: name (v/c), weight (e/n)
## + edges from 07152a3 (vertex names):
## [1] a->b b->c b->d c->b c->e
If we wanted to convert a graph back to a matrix, then we can do that as well:
attr is provided, then an adjacency matrix is providedattr is provided, the matrix is filled with that particular edge attribute. The attribute must be either numeric or logicalas_adjacency_matrix(g)
## 5 x 5 sparse Matrix of class "dgCMatrix"
## a b c d e
## a . 1 . . .
## b . . 1 1 .
## c . 1 . . 1
## d . . . . .
## e . . . . .
as_adjacency_matrix(g, attr = 'weight')
## 5 x 5 sparse Matrix of class "dgCMatrix"
## a b c d e
## a . 1 . . .
## b . . 1 2 .
## c . 1 . . 2
## d . . . . .
## e . . . . .
The only data frame that is required to create a graph with graph_from_data_frame() is one representing an edges list. However, if we want the nodes to contain information other than their name, then we will need to also provide a data frame representing a node list. It is important to note:
directed = T then the first column is the source and the second is the targetnode_list <- data.frame(
name = c('a', 'b', 'c', 'd', 'e'),
type = "letter",
order = c(1, 2, 3, 4, 5)
)
edge_list <- data.frame(
source = c('a', 'b', 'b', 'c', 'c'),
target = c('b', 'c', 'd', 'b', 'e'),
type = 'distance',
weight = c(1, 1, 2, 1, 2)
)
g <- graph_from_data_frame(d = edge_list,
directed = T,
vertices = node_list)
The resulting graph is, for all intents and purposes, identical to the one we built from an empty graph.
g
## IGRAPH b2edb04 DNWB 5 5 --
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from b2edb04 (vertex names):
## [1] a->b b->c b->d c->b c->e
We can also convert the graph back to a data frame:
what argument accepts the strings “edges”, “vertices”, or “both” and will return the appropriate data frame. If ‘both’ is entered, then a list with bothe data frames is returned.as_data_frame(g, 'both')
## $vertices
## name type order
## a a letter 1
## b b letter 2
## c c letter 3
## d d letter 4
## e e letter 5
##
## $edges
## from to type weight
## 1 a b distance 1
## 2 b c distance 1
## 3 b d distance 2
## 4 c b distance 1
## 5 c e distance 2
Graphs are built with nodes and edges. These components have data stored in them and it is important to understand how this data can be accessed. The two work hourses of graph exploration in igraph are:
V() to access and manipulate data stored in verticesE() to access and manipulate data stored in edgesOnce we are comfortable with those two functions, we can use them to make a subgraph with either:
induced_subgraph() if we have vector of verticessubgraph.edges() if we have a vector of edgesV() only accepts an igraph graph object and returns a vector of vertices. This vector is what we use to explore the vertices of the graph and it acts like a normal c() vector in R with some syntactic sugar
V(g)
## + 5/5 vertices, named, from b2edb04:
## [1] a b c d e
You can access information of a specific vertex by inputting its index number.
V(g)[3]
## + 1/5 vertex, named, from b2edb04:
## [1] c
If the vertex has a name, then you can use its name instead of the index number.
V(g)['c']
## + 1/5 vertex, named, from b2edb04:
## [1] c
Accessing a vertex using double square bracket notation gives you all the information stored in it.
V(g)[['c']]
## + 1/5 vertex, named, from b2edb04:
## name type order
## 3 c letter 3
A specific attribute can be accessed using dollar sign notation. You should use single square bracket notation when accesing specific attributes.
V(g)['c']$order
## [1] 3
You can use the assignment arrow (<-) to change information of a particular vertex
V(g)['c']$order <- 1
#vcount provides the number of vertices in a graph
V(g)[[1:vcount(g)]]
## + 5/5 vertices, named, from b2edb04:
## name type order
## 1 a letter 1
## 2 b letter 2
## 3 c letter 1
## 4 d letter 4
## 5 e letter 5
You can access multiple vertices either by providing a vector of names
V(g)[[c('c', 'e')]]
## + 2/5 vertices, named, from b2edb04:
## name type order
## 3 c letter 1
## 5 e letter 5
You can also query the attributes of the vertex vector.
V(g)[[order == 1]]
## + 2/5 vertices, named, from b2edb04:
## name type order
## 1 a letter 1
## 3 c letter 1
Vertex vectors can be combined just like any other vector in R
ab <- V(g)[[c('a', 'b')]]
de <- V(g)[[c('d', 'e')]]
c(ab, de)
## + 4/5 vertices, named, from b2edb04:
## [1] a b d e
E() behves much in the same way as V(). It only accepts an igraph graph object and returns a vector of edges. A big difference, however, is how we approach accessing data stroed in edges. While edges can have names and can be accessed directly using either a name or an id, this is not the best approach since as graphs get larger, the sheer number and variety of edges make it very impractical to do so.
Instead edges should be accessed by querying their attributes
E(g)[weight == 2]
## + 2/5 edges from b2edb04 (vertex names):
## [1] b->d c->e
Using double square notation gives you more information about the queried edges.
E(g)[[weight == 2]]
## + 2/5 edges from b2edb04 (vertex names):
## tail head tid hid type weight
## 3 b d 2 4 distance 2
## 5 c e 3 5 distance 2
You can also change the attributes of the selected edges. You should use single square notation
E(g)[weight == 2]$weight <- -20000
#ecount() provides the number of edges in a graph
E(g)[[1:ecount(g)]]
## + 5/5 edges from b2edb04 (vertex names):
## tail head tid hid type weight
## 1 a b 1 2 distance 1
## 2 b c 2 3 distance 1
## 3 b d 2 4 distance -20000
## 4 c b 3 2 distance 1
## 5 c e 3 5 distance -20000
We can also query edges for the nodes they connect.
LHS %--% RHS represents any edge between LHS and RHS
E(g)['a' %--% 'b']
## + 1/5 edge from b2edb04 (vertex names):
## [1] a->b
LHS %<-% RHS represents edges going to LHS from RHS
E(g)['c' %<-% 'b']
## + 1/5 edge from b2edb04 (vertex names):
## [1] b->c
LHS %->% RHS represents edges leaving from LHS to RHS
#you don't need to now the specific name of the vertices
E(g)['b' %->% V(g)]
## + 2/5 edges from b2edb04 (vertex names):
## [1] b->c b->d
Edge vectors can also be combined together using c()
b <- E(g)['b' %->% V(g)]
c <- E(g)['c' %->% V(g)]
c(b, c)
## + 4/5 edges from b2edb04 (vertex names):
## [1] b->c b->d c->b c->e
induced_subgraph() filters the graph to include only the vertices within the provided vertex vector or vector of vertex names. Any edges between the provided nodes are preserved.
abc_sub <- induced_subgraph(g, c('a', 'b', 'c'))
abc_sub
## IGRAPH c246328 DNWB 3 3 --
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from c246328 (vertex names):
## [1] a->b b->c c->b
subgraph.edges() filters the graph to include only the edges within the provided edge vector. By default, only vertices that are connected to the edges are kept. If you want to keep all the vertices, then you can set the parameter delete.vertices = F
b_sub <- subgraph.edges(g, E(g)["b" %--% V(g)])
b_sub
## IGRAPH 31e899b DNWB 4 4 --
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from 31e899b (vertex names):
## [1] a->b b->c b->d c->b
In igraph, subgraphs are completely new graphs and their components are completely different from those of the graphs they were created from. The below won’t work:
E(g)[V(b_sub)['b'] %--% V(abc_sub)]
Instead, you will need to use attributes that you know are in both graphs, like names:
E(g)[V(b_sub)['b']$name %--% V(abc_sub)$name]
## + 3/5 edges from b2edb04 (vertex names):
## [1] a->b b->c c->b
Also, ids don’t stay consistent when a subgraph is created. That is, V(g)[3] won’t necessarily represent the same vertex in the subgraph V(subg)[3]. It is good practice to name the vertices and use the vertex names to keep things consistent. V(g)["person_1"] will be the same in the subgraph V(subg)["person_1"]