vignettes/topic1_04_Reactome.Rmd
topic1_04_Reactome.Rmd
Reactome is another popular pathway database. It organises pathways in a hierarchical manner, which contains pathways and sub pathways or pathway components. The up-to-date pathway data can be direclty found at https://reactome.org/download-data.
There is a reactome.db on Bioconductor.
The version of the data:
library(reactome.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 4.3.2
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
reactome.db
## ReactomeDb object:
## | DBSCHEMA: REACTOME_DB
## | DBSCHEMAVERSION: 84
## | SOURCENAME: Reactome
## | SOURCEURL: http://www.reactome.org/download/current/
## | SOURCEDATE: 2023-04-05
## | Supporting package: AnnotationDbi
## | Db type: ReactomeDb
##
## Please see: help('select') for usage information
In it, the important objects are
reactomePATHID2EXTID
contains mappings between reacotme pathway IDs and gene entrez IDsreactomePATHID2NAME
contains pathway names## DB_ID gene_id
## 1 R-HSA-109582 1
## 2 R-HSA-114608 1
## 3 R-HSA-168249 1
## 4 R-HSA-168256 1
## 5 R-HSA-6798695 1
## 6 R-HSA-76002 1
## DB_ID
## 1 R-BTA-73843
## 2 R-BTA-1971475
## 3 R-BTA-1369062
## 4 R-BTA-382556
## 5 R-BTA-9033807
## 6 R-BTA-418592
## path_name
## 1 1-diphosphate: 5-Phosphoribose
## 2 Bos taurus: A tetrasaccharide linker sequence is required for GAG synthesis
## 3 Bos taurus: ABC transporters in lipid homeostasis
## 4 Bos taurus: ABC-family proteins mediated transport
## 5 Bos taurus: ABO blood group biosynthesis
## 6 Bos taurus: ADP signalling through P2Y purinoceptor 1
In the previous code, we use the function toTable()
to retrieve the data as a data frame. You can also try as.list()
on the two objects and compare the output.
Reactome contains pathway for multiple organisms. In the reactome ID, teh second section contains the organism, e.g. in previous output HSA
.
##
## MTU PFA SCE SPO DDI CEL DME XTR CFA SSC DRE BTA RNO GGA MMU HSA
## 13 602 817 822 990 1310 1480 1582 1660 1663 1679 1699 1705 1708 1718 2615
Make the distribution of the numbers of genes in Reactome pathways (use human).
Print the names of pathways with numbers of genes > 2000.
tb = toTable(reactomePATHID2EXTID)
tb = tb[grep("-HSA-", tb[, 1]), ]
n_gene = table(tb[, 1])
hist(n_gene)
Making the intervals smaller is better:
hist(n_gene, nc = 100)
Pathways more than 2000 genes:
n_gene[n_gene > 2000]
##
## R-HSA-1430728 R-HSA-162582 R-HSA-1643685 R-HSA-168256
## 2121 2594 2018 2038
Their names:
## DB_ID path_name
## 11237 R-HSA-1643685 Homo sapiens: Disease
## 11639 R-HSA-168256 Homo sapiens: Immune System
## 11870 R-HSA-1430728 Homo sapiens: Metabolism
## 12619 R-HSA-162582 Homo sapiens: Signal Transduction
If you go to https://reactome.org/PathwayBrowser/, these big pathways correspond to the pathway clusters on the top level.