Reactome is another popular pathway database. It organises pathways in a hierarchical manner, which contains pathways and sub pathways or pathway components. The up-to-date pathway data can be direclty found at https://reactome.org/download-data.

reactome.db

There is a reactome.db on Bioconductor.

The version of the data:

library(reactome.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 4.3.2
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
reactome.db
## ReactomeDb object:
## | DBSCHEMA: REACTOME_DB
## | DBSCHEMAVERSION: 84
## | SOURCENAME: Reactome
## | SOURCEURL: http://www.reactome.org/download/current/
## | SOURCEDATE: 2023-04-05
## | Supporting package: AnnotationDbi
## | Db type: ReactomeDb
## 
## Please see: help('select') for usage information

In it, the important objects are

  • reactomePATHID2EXTID contains mappings between reacotme pathway IDs and gene entrez IDs
  • reactomePATHID2NAME contains pathway names
library(reactome.db)
tb = toTable(reactomePATHID2EXTID)
head(tb)
##           DB_ID gene_id
## 1  R-HSA-109582       1
## 2  R-HSA-114608       1
## 3  R-HSA-168249       1
## 4  R-HSA-168256       1
## 5 R-HSA-6798695       1
## 6   R-HSA-76002       1
p2n = toTable(reactomePATHID2NAME)
head(p2n)
##           DB_ID
## 1   R-BTA-73843
## 2 R-BTA-1971475
## 3 R-BTA-1369062
## 4  R-BTA-382556
## 5 R-BTA-9033807
## 6  R-BTA-418592
##                                                                     path_name
## 1                                              1-diphosphate: 5-Phosphoribose
## 2 Bos taurus: A tetrasaccharide linker sequence is required for GAG synthesis
## 3                           Bos taurus: ABC transporters in lipid homeostasis
## 4                          Bos taurus: ABC-family proteins mediated transport
## 5                                    Bos taurus: ABO blood group biosynthesis
## 6                       Bos taurus: ADP signalling through P2Y purinoceptor 1

In the previous code, we use the function toTable() to retrieve the data as a data frame. You can also try as.list() on the two objects and compare the output.

Reactome contains pathway for multiple organisms. In the reactome ID, teh second section contains the organism, e.g. in previous output HSA.

sort(table( gsub("^R-(\\w+)-\\d+$", "\\1", p2n[, 1]) ))
## 
##  MTU  PFA  SCE  SPO  DDI  CEL  DME  XTR  CFA  SSC  DRE  BTA  RNO  GGA  MMU  HSA 
##   13  602  817  822  990 1310 1480 1582 1660 1663 1679 1699 1705 1708 1718 2615
barplot(sort(table( gsub("^R-(\\w+)-\\d+$", "\\1", p2n[, 1]) )))

Practice

Practice 1

Make the distribution of the numbers of genes in Reactome pathways (use human).

Print the names of pathways with numbers of genes > 2000.

Solution

tb = toTable(reactomePATHID2EXTID)
tb = tb[grep("-HSA-", tb[, 1]), ]
n_gene = table(tb[, 1])
hist(n_gene)

Making the intervals smaller is better:

hist(n_gene, nc = 100)

Pathways more than 2000 genes:

n_gene[n_gene > 2000]
## 
## R-HSA-1430728  R-HSA-162582 R-HSA-1643685  R-HSA-168256 
##          2121          2594          2018          2038

Their names:

p2n = toTable(reactomePATHID2NAME)
p2n[p2n[, 1] %in% names(n_gene[n_gene > 2000]), ]
##               DB_ID                         path_name
## 11237 R-HSA-1643685             Homo sapiens: Disease
## 11639  R-HSA-168256       Homo sapiens: Immune System
## 11870 R-HSA-1430728          Homo sapiens: Metabolism
## 12619  R-HSA-162582 Homo sapiens: Signal Transduction

If you go to https://reactome.org/PathwayBrowser/, these big pathways correspond to the pathway clusters on the top level.