Over-representation analysis

Usage

ora(genes, gs, universe = NULL, min_hits = 3, min_size = 5, max_size = 2000)

ora_go(genes, org_db = org.Hs.eg.db::org.Hs.eg.db, ontology = "BP", ...)

ora_kegg(genes, organism = "hsa", db = "pathway", ...)

ora_msigdb(genes, collection = "h.all", version = "2024.1.Hs", ...)

ora_reactome(genes, organism = "HSA", ...)

ora_keywords(genes, organism = "human", ...)

ora_phenotype(genes, organism = "human", ...)

ora_disease(genes, organism = "human", ...)

Arguments

genes: A vector of genes.
gs: A list of gene sets. Genes should have the smae ID type as in genes.
universe: A vector of background genes.
min_hits: Minimal number of overlapping genes in genes and gene sets.
min_size: Minimal number of genes in gene sets.
max_size: Maximal number of genes in gene sets.
org_db: An OrgDb object for the organism. It can be from org.*.db packages or downloaded by the AnnotationHub package.
ontology: Namespace of GO. Value should be one of "BP", "CC" or "MF".
...: Passed to ora().
organism: See Details.
db: A KEGG database. The value can be one of "pathway", "module", "ko", "network", "disease" and "drug".
collection: Collection of the MSigDB gene sets. All possible values can be found via list_msigdb_versions().
version: Version of the MSigDB database. All possible values can be found via list_msigdb_collections().

Details

Except ora(), gene IDs in s in all ora_*() functions must be EntreZ IDs.

The value should be set differently for specific ora_*() functions.

for ora_kegg(), the value should be a KEGG organism code, such as "hsa" or "mmu".
for ora_reactome(), the value should a prefix of the Reactome pathway ID that represents the organism. E.g. "HSA" for human.
for ora_keywords(), the value can be a organism name, e.g. "human", the latin name or the taxon ID.
for ora_phenotype() and fgsea_disease(), the value can only be one of "human", "mouse" and "rat".

All valid values for fgsea_reactome() are:

c("BTA", "CEL", "CFA", "DRE", "DDI", "DME", "GGA", "HSA", "MMU",
  "MTU", "PFA", "RNO", "SCE", "SPO", "SSC", "XTR")

Examples

data(p53_dataset)
s = p53_dataset$s2n
gs = p53_dataset$gs
diff = names(s)[abs(s) > 0.3]

ora(diff, gs) |> head()
#>               gene_set n_hits n_genes n_gs n_total   log2fe  z_score
#> 347 SA_G1_AND_S_PHASES      6     307   15    5602 2.867703 5.881640
#> 466             P53_UP      9     307   40    5602 2.037628 4.746167
#> 160       hsp27Pathway      5     307   16    5602 2.511560 4.535180
#> 302         p53Pathway      5     307   16    5602 2.511560 4.535180
#> 154          gsPathway      3     307    6    5602 3.189631 4.793623
#> 66       chrebpPathway      5     307   20    5602 2.189631 3.842107
#>          p_value   p_adjust
#> 347 8.475401e-05 0.04237701
#> 466 2.397667e-04 0.05994168
#> 160 1.265963e-03 0.15824542
#> 302 1.265963e-03 0.15824542
#> 154 2.879892e-03 0.23433797
#> 66  3.749408e-03 0.23433797

diff2 = convert_to_entrez(diff)
#>   gene id might be SYMBOL (p =  0.660 )
#> 'select()' returned 1:many mapping between keys and columns
ora_go(diff2) |> head()
#>        gene_set n_hits n_genes n_gs n_total    log2fe  z_score      p_value
#> 8046 GO:1901700     93     497 1704   18986 1.0599955 7.695987 5.433047e-12
#> 1280 GO:0007155     86     497 1553   18986 1.0809690 7.521060 1.916316e-11
#> 4185 GO:0042592     90     497 1767   18986 0.9603131 6.843988 4.892569e-10
#> 5355 GO:0048871     54     497  832   18986 1.3099941 7.154667 7.558000e-10
#> 5620 GO:0051240     87     497 1724   18986 0.9469458 6.623663 1.598871e-09
#> 1697 GO:0009607     86     497 1704   18986 0.9471015 6.582795 2.015310e-09
#>          p_adjust                                             description
#> 8046 4.989710e-08                  response to oxygen-containing compound
#> 1280 8.799723e-08                                           cell adhesion
#> 4185 1.497778e-06                                     homeostatic process
#> 5355 1.735317e-06              multicellular organismal-level homeostasis
#> 5620 2.936806e-06 positive regulation of multicellular organismal process
#> 1697 3.084768e-06                             response to biotic stimulus
ora_msigdb(diff2) |> head()
#>                            gene_set n_hits n_genes n_gs n_total    log2fe
#> 37             HALLMARK_P53_PATHWAY     21     207  200    4384 1.1530064
#> 34              HALLMARK_MYOGENESIS     20     207  200    4384 1.0826170
#> 7                HALLMARK_APOPTOSIS     14     207  161    4384 0.8809832
#> 2      HALLMARK_ALLOGRAFT_REJECTION     16     207  200    4384 0.7606889
#> 25   HALLMARK_INFLAMMATORY_RESPONSE     16     207  200    4384 0.7606889
#> 45 HALLMARK_TNFA_SIGNALING_VIA_NFKB     15     207  200    4384 0.6675795
#>     z_score      p_value   p_adjust
#> 37 3.943274 0.0003993388 0.01996694
#> 34 3.602059 0.0010293788 0.02573447
#> 7  2.421934 0.0188585130 0.25586664
#> 2  2.237199 0.0255866637 0.25586664
#> 25 2.237199 0.0255866637 0.25586664
#> 45 1.895984 0.0490656010 0.40888001