This document will focus on understanding an igraph graph and it’s two main components - nodes and edges. We can also use these components to filter the graph down to a subgraph.

First, we need to load the igraph package

library(igraph)

Building a graph

We now need to create a graph to explore. There are many different ways we can create a new graph, but here we will only cover three of those ways:

Adding vertices and edges to an empty graph by using
- make_empty_graph()
- vertices()
- edges()
Converting an existing matrix into a graph by
- creating a sparse square matrix with the rows and columns representing the nodes
- Filling in elements with values to represent the edges
- using graph_from_adjacency_matrix()
Converting existing data frames into a graph by
- creating a data frame that represents an edge list
- (optional) creating a data frame that represents a node list
- using graph_from_data_frame()

Building up an empty graph

When we build a graph up from an empty graph we need to first initiate the vertices so we can refer to them when we build the edges. It is important to note:

Vertex names don’t have to be unique, but it is considered good practice
Edges are defined in pairwise order
- If directed = F then the order within the pair doesn’t matter
- If directed = T then the first vertex in the pair is the source and the second is the target
Attributes can be explicitly added to both vertices and edges

g <- make_empty_graph(directed = T) +
  vertices(c('a', 'b', 'c', 'd', 'e'),
           type = "letter",
           order = c(1, 2, 3, 4, 5)) +
  edges(c('a', 'b',
          'b', 'c',
          'b', 'd',
          'c', 'b',
          'c', 'e'),
        type = "is connected to",
        weight = c(1, 1, 2, 1, 2))

Let’s take a look at how igraph represents a graph in R

## IGRAPH 8bbdecc DNWB 5 5 -- 
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from 8bbdecc (vertex names):
## [1] a->b b->c b->d c->b c->e

igraph also provides a plot() to explore the graph visually

set.seed(1234)
plot(g)

Creating a graph from a matrix

The rows and columns of a matrix represent the vertices of a graph. If the element is not 0, then it represents a link between two vertices. It is important to note:

The non-zero number matters regardless if the graph is weighted or not.
- if weighted = F then it represents the number of links between the two vertices
- if weighted = T then it represents the weight of the edge
- if weighted = a character constant, the the number becomes an edge attribute named after the constant
A symetric matrix is very different if the output graph is directed or not.
- if mode = "directed" then the rows represent the source nodes and the columns represent the target nodes
- if mode = "undirected" then only one element is chosen from either matrix[i][j] or matrix[j][i]

src <- c('a', 'b', 'b', 'c', 'c')
tgt <- c('b', 'c', 'd', 'b', 'e')
wgt <- c(1, 1, 2, 1, 2)
matData <- matrix(0, nrow = 5, ncol = 5, dimnames = list(letters[1:5], letters[1:5]))

sapply(seq_along(src), function(i){
  matData[src[i], tgt[i]] <<- wgt[i]
})

## [1] 1 1 2 1 2

matData

##   a b c d e
## a 0 1 0 0 0
## b 0 0 1 2 0
## c 0 1 0 0 2
## d 0 0 0 0 0
## e 0 0 0 0 0

g <- graph_from_adjacency_matrix(matData, mode = 'directed', weighted = 'weight') 

g

## IGRAPH 07152a3 DNW- 5 5 -- 
## + attr: name (v/c), weight (e/n)
## + edges from 07152a3 (vertex names):
## [1] a->b b->c b->d c->b c->e

If we wanted to convert a graph back to a matrix, then we can do that as well:

If no attr is provided, then an adjacency matrix is provided
If attr is provided, the matrix is filled with that particular edge attribute. The attribute must be either numeric or logical

as_adjacency_matrix(g)

## 5 x 5 sparse Matrix of class "dgCMatrix"
##   a b c d e
## a . 1 . . .
## b . . 1 1 .
## c . 1 . . 1
## d . . . . .
## e . . . . .

as_adjacency_matrix(g, attr = 'weight')

## 5 x 5 sparse Matrix of class "dgCMatrix"
##   a b c d e
## a . 1 . . .
## b . . 1 2 .
## c . 1 . . 2
## d . . . . .
## e . . . . .

Creating a graph from a data frame

The only data frame that is required to create a graph with graph_from_data_frame() is one representing an edges list. However, if we want the nodes to contain information other than their name, then we will need to also provide a data frame representing a node list. It is important to note:

The first two columns of the edge list must represent the nodes the edges connect
- If directed = T then the first column is the source and the second is the target
If a node list is provided, then its first column must represent the name of the nodes
- The names of the nodes must all be unique
- Any nodes referred to in the edge list must exist in the node list

node_list <- data.frame(
  name = c('a', 'b', 'c', 'd', 'e'),
  type = "letter",
  order = c(1, 2, 3, 4, 5)
)

edge_list <- data.frame(
  source = c('a', 'b', 'b', 'c', 'c'),
  target = c('b', 'c', 'd', 'b', 'e'),
  type = 'distance',
  weight = c(1, 1, 2, 1, 2)
)
g <- graph_from_data_frame(d = edge_list, 
                           directed = T,
                           vertices = node_list)

The resulting graph is, for all intents and purposes, identical to the one we built from an empty graph.

## IGRAPH b2edb04 DNWB 5 5 -- 
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from b2edb04 (vertex names):
## [1] a->b b->c b->d c->b c->e

We can also convert the graph back to a data frame:

The what argument accepts the strings “edges”, “vertices”, or “both” and will return the appropriate data frame. If ‘both’ is entered, then a list with bothe data frames is returned.

as_data_frame(g, 'both')

## $vertices
##   name   type order
## a    a letter     1
## b    b letter     2
## c    c letter     3
## d    d letter     4
## e    e letter     5
## 
## $edges
##   from to     type weight
## 1    a  b distance      1
## 2    b  c distance      1
## 3    b  d distance      2
## 4    c  b distance      1
## 5    c  e distance      2

Accessing & Manipulating graph components

Graphs are built with nodes and edges. These components have data stored in them and it is important to understand how this data can be accessed. The two work hourses of graph exploration in igraph are:

V() to access and manipulate data stored in vertices
E() to access and manipulate data stored in edges

Once we are comfortable with those two functions, we can use them to make a subgraph with either:

induced_subgraph() if we have vector of vertices
subgraph.edges() if we have a vector of edges

V()

V() only accepts an igraph graph object and returns a vector of vertices. This vector is what we use to explore the vertices of the graph and it acts like a normal c() vector in R with some syntactic sugar

V(g)

## + 5/5 vertices, named, from b2edb04:
## [1] a b c d e

You can access information of a specific vertex by inputting its index number.

V(g)[3]

## + 1/5 vertex, named, from b2edb04:
## [1] c

If the vertex has a name, then you can use its name instead of the index number.

V(g)['c']

## + 1/5 vertex, named, from b2edb04:
## [1] c

Accessing a vertex using double square bracket notation gives you all the information stored in it.

V(g)[['c']]

## + 1/5 vertex, named, from b2edb04:
##   name   type order
## 3    c letter     3

A specific attribute can be accessed using dollar sign notation. You should use single square bracket notation when accesing specific attributes.

V(g)['c']$order

## [1] 3

You can use the assignment arrow (<-) to change information of a particular vertex

V(g)['c']$order <- 1

#vcount provides the number of vertices in a graph
V(g)[[1:vcount(g)]]

## + 5/5 vertices, named, from b2edb04:
##   name   type order
## 1    a letter     1
## 2    b letter     2
## 3    c letter     1
## 4    d letter     4
## 5    e letter     5

You can access multiple vertices either by providing a vector of names

V(g)[[c('c', 'e')]]

## + 2/5 vertices, named, from b2edb04:
##   name   type order
## 3    c letter     1
## 5    e letter     5

You can also query the attributes of the vertex vector.

V(g)[[order == 1]]

## + 2/5 vertices, named, from b2edb04:
##   name   type order
## 1    a letter     1
## 3    c letter     1

Vertex vectors can be combined just like any other vector in R

ab <- V(g)[[c('a', 'b')]]
de <- V(g)[[c('d', 'e')]]
c(ab, de)

## + 4/5 vertices, named, from b2edb04:
## [1] a b d e

E()

E() behves much in the same way as V(). It only accepts an igraph graph object and returns a vector of edges. A big difference, however, is how we approach accessing data stroed in edges. While edges can have names and can be accessed directly using either a name or an id, this is not the best approach since as graphs get larger, the sheer number and variety of edges make it very impractical to do so.

Instead edges should be accessed by querying their attributes

E(g)[weight == 2]

## + 2/5 edges from b2edb04 (vertex names):
## [1] b->d c->e

Using double square notation gives you more information about the queried edges.

E(g)[[weight == 2]]

## + 2/5 edges from b2edb04 (vertex names):
##   tail head tid hid     type weight
## 3    b    d   2   4 distance      2
## 5    c    e   3   5 distance      2

You can also change the attributes of the selected edges. You should use single square notation

E(g)[weight == 2]$weight <- -20000

#ecount() provides the number of edges in a graph
E(g)[[1:ecount(g)]]

## + 5/5 edges from b2edb04 (vertex names):
##   tail head tid hid     type weight
## 1    a    b   1   2 distance      1
## 2    b    c   2   3 distance      1
## 3    b    d   2   4 distance -20000
## 4    c    b   3   2 distance      1
## 5    c    e   3   5 distance -20000

We can also query edges for the nodes they connect.

LHS %--% RHS represents any edge between LHS and RHS

E(g)['a' %--% 'b']

## + 1/5 edge from b2edb04 (vertex names):
## [1] a->b

LHS %<-% RHS represents edges going to LHS from RHS

E(g)['c' %<-% 'b']

## + 1/5 edge from b2edb04 (vertex names):
## [1] b->c

LHS %->% RHS represents edges leaving from LHS to RHS

#you don't need to now the specific name of the vertices
E(g)['b' %->% V(g)]

## + 2/5 edges from b2edb04 (vertex names):
## [1] b->c b->d

Edge vectors can also be combined together using c()

b <- E(g)['b' %->% V(g)]
c <- E(g)['c' %->% V(g)]
c(b, c)

## + 4/5 edges from b2edb04 (vertex names):
## [1] b->c b->d c->b c->e

Subgraphs

induced_subgraph() filters the graph to include only the vertices within the provided vertex vector or vector of vertex names. Any edges between the provided nodes are preserved.

abc_sub <- induced_subgraph(g, c('a', 'b', 'c'))
abc_sub

## IGRAPH c246328 DNWB 3 3 -- 
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from c246328 (vertex names):
## [1] a->b b->c c->b

subgraph.edges() filters the graph to include only the edges within the provided edge vector. By default, only vertices that are connected to the edges are kept. If you want to keep all the vertices, then you can set the parameter delete.vertices = F

b_sub <- subgraph.edges(g, E(g)["b" %--% V(g)])
b_sub

## IGRAPH 31e899b DNWB 4 4 -- 
## + attr: name (v/c), type (v/c), order (v/n), type (e/c), weight
## | (e/n)
## + edges from 31e899b (vertex names):
## [1] a->b b->c b->d c->b

Subgraphs Gotchas

In igraph, subgraphs are completely new graphs and their components are completely different from those of the graphs they were created from. The below won’t work:

E(g)[V(b_sub)['b'] %--% V(abc_sub)]

Instead, you will need to use attributes that you know are in both graphs, like names:

E(g)[V(b_sub)['b']$name %--% V(abc_sub)$name]

## + 3/5 edges from b2edb04 (vertex names):
## [1] a->b b->c c->b

Also, ids don’t stay consistent when a subgraph is created. That is, V(g)[3] won’t necessarily represent the same vertex in the subgraph V(subg)[3]. It is good practice to name the vertices and use the vertex names to keep things consistent. V(g)["person_1"] will be the same in the subgraph V(subg)["person_1"]

Part 1: Elements of igraph