GSVA

We need an expression matrix as well as a collection of gene sets.

lt = readRDS(system.file("extdata", "p53_expr.rds", package = "GSEAtraining"))
expr = lt$expr
condition = lt$condition

ln = strsplit(readLines(system.file("extdata", "c2.symbols.gmt", package = "GSEAtraining")), "\t")
gs = lapply(ln, function(x) x[-(1:2)])
names(gs) = sapply(ln, function(x) x[1])

Running GSVA analysis is simple. Just note the gene IDs in expr should match the gene IDs in gs.

The returned value gs_mat is a set-sample matrix that contains “single sample-based gene set variation scores”.

library(GSVA)
gs_mat = gsva(expr, gs, verbose = FALSE)
dim(gs_mat)
## [1] 522  50

gs_mat can be used for downstream analysis. E.g. make heatmaps:

library(ComplexHeatmap)
Heatmap(gs_mat, top_annotation = HeatmapAnnotation(cond = condition),
    column_split = condition)

Apply t-test on each row of gs_mat to test whether the gene set-level profile has difference between the two conditions.

library(genefilter)
tdf = rowttests(gs_mat, factor(condition))
tdf$fdr = p.adjust(tdf$p.value, "BH")

How many gene sets are significant?

tdf[tdf$fdr < 0.05, ]
##            statistic        dm      p.value         fdr
## ngfPathway  4.793368 0.2555689 1.623384e-05 0.008474063
## rasPathway  4.178302 0.2625379 1.234110e-04 0.032210282

Compare to normal GSEA

As a comparison, we also perform a normal GSEA analysis. We use the t-value as the gene-level score:

s = rowttests(expr, factor(condition))[, "statistic"]
names(s) = rownames(expr)

Here we also test to the c2 gene set collection, so we need to convert gs to the format clusterProfiler accepts:

map = data.frame(
    gene_set = rep(names(gs), times = sapply(gs, length)),
    gene = unlist(gs)
)
    
library(clusterProfiler)
tb = GSEA(geneList = sort(s, decreasing = TRUE), TERM2GENE = map, pvalueCutoff = 1)
head(tb)
##                                                        ID
## hsp27Pathway                                 hsp27Pathway
## P53_UP                                             P53_UP
## HTERT_UP                                         HTERT_UP
## GPCRs_Class_A_Rhodopsin-like GPCRs_Class_A_Rhodopsin-like
## p53Pathway                                     p53Pathway
## p53hypoxiaPathway                       p53hypoxiaPathway
##                                               Description setSize
## hsp27Pathway                                 hsp27Pathway      15
## P53_UP                                             P53_UP      40
## HTERT_UP                                         HTERT_UP     109
## GPCRs_Class_A_Rhodopsin-like GPCRs_Class_A_Rhodopsin-like     111
## p53Pathway                                     p53Pathway      16
## p53hypoxiaPathway                       p53hypoxiaPathway      20
##                              enrichmentScore       NES       pvalue    p.adjust
## hsp27Pathway                      -0.7841899 -2.165979 6.587867e-06 0.001655016
## P53_UP                            -0.6132568 -2.137330 1.104503e-05 0.001655016
## HTERT_UP                           0.3612304  1.897270 1.394074e-05 0.001655016
## GPCRs_Class_A_Rhodopsin-like      -0.4503613 -1.817010 1.675966e-05 0.001655016
## p53Pathway                        -0.7524677 -2.129118 2.686248e-05 0.002122136
## p53hypoxiaPathway                 -0.6940780 -2.069870 5.878618e-05 0.003870090
##                                   qvalue rank                   leading_edge
## hsp27Pathway                 0.001477496  882  tags=53%, list=9%, signal=49%
## P53_UP                       0.001477496  464  tags=22%, list=5%, signal=22%
## HTERT_UP                     0.001477496 1523 tags=33%, list=15%, signal=28%
## GPCRs_Class_A_Rhodopsin-like 0.001477496 2120 tags=35%, list=21%, signal=28%
## p53Pathway                   0.001894511  252  tags=31%, list=2%, signal=31%
## p53hypoxiaPathway            0.003454977  741  tags=30%, list=7%, signal=28%
##                                                                                                                                                                                                                                                  core_enrichment
## hsp27Pathway                                                                                                                                                                                                      HSPB1/ACTA1/MAPKAPK2/TNF/IL1A/BCL2/FAS/TNFRSF6
## P53_UP                                                                                                                                                                                                         NINJ1/PLK3/BBC3/TNFRSF6/BTG2/DDB2/MDM2/BAX/CDKN1A
## HTERT_UP                                  DR1/MRPL49/DAP/AHR/EPHA2/LMO4/CDKN2A/KLF5/KIAA0063/WASF1/TSG101/ELF4/CCNH/KIAA0092/HYOU1/CDC6/DYRK1A/SFRS11/ADAM8/LHFPL2/ZNF165/RAE1/SH3BGR/TCTEL1/DDX10/HSF2/MAPK9/SLC1A5/EGFR/MCFD2/CBX5/RDBP/CDKN3/SNRPA1/CTH/GFPT1
## GPCRs_Class_A_Rhodopsin-like ADRA2C/CNR2/FPRL1/EDNRB/FPRL2/RGR/GALR3/OPRL1/NMBR/ADRB1/BDKRB1/RRH/PPYR1/CXCR3/HTR2B/CMKOR1/HTR7/OPRD1/GPR23/AGTR2/EDNRA/ADORA1/ADRA1D/DRD3/CCKBR/GPR44/DRD1/MC1R/HTR2C/CCR2/CCR8/HTR4/HTR1B/MTNR1B/CCBP2/F2RL2/GPR50/ADRA2A/NTSR2
## p53Pathway                                                                                                                                                                                                                             CDK2/BCL2/MDM2/BAX/CDKN1A
## p53hypoxiaPathway                                                                                                                                                                                                               CSNK1D/HIC1/CPB2/MDM2/BAX/CDKN1A

Let’s check the expression of genes in the p53Pathway:

g = intersect(gs[["p53Pathway"]], rownames(expr))
mm = expr[g, ]
mm = t(scale(t(mm)))  # z-score transformation

Heatmap(mm, name = "z-score", top_annotation = HeatmapAnnotation(cond = condition),
    column_title = "p53Pathway",
    right_annotation = rowAnnotation(bar = anno_barplot(s[g], axis_param = list(direction = "reverse"), width = unit(2, "cm"))),
    column_split = condition) %v%
Heatmap(gs_mat["hivnefPathway", , drop = FALSE], name = "set variation")

Next we make pairwise scatterplot of p-values and statistics for both analysis.

cn = intersect(rownames(tdf), tb$ID)
par(mfrow = c(1, 2))
plot(tdf[cn, "p.value"], tb[cn, "pvalue"],
    xlab = "GSVA", ylab = "normal GSEA", main = "p-values")
plot(tdf[cn, "statistic"], tb[cn, "NES"],
    xlab = "GSVA / t-values", ylab = "normal GSEA / NES")
abline(h = 0, lty = 2, col = "grey")
abline(v = 0, lty = 2, col = "grey")

This makes a problem here that one method generates a significant gene set which can be completely insignificant under another method (left plot). The direction of the general differentiation of a gene set can be reversed in different methods.

g = intersect(gs[["hivnefPathway"]], rownames(expr))
mm = expr[g, ]
mm = t(scale(t(mm)))  # z-score transformation

Heatmap(mm, name = "z-score", top_annotation = HeatmapAnnotation(cond = condition),
    right_annotation = rowAnnotation(bar = anno_barplot(s[g], axis_param = list(direction = "reverse"), width = unit(2, "cm"))),
    column_split = condition) %v%
Heatmap(gs_mat["hivnefPathway", , drop = FALSE], name = "set variation")

Conclusion: It does not seem ssGSEA is better than GSEA. Use with caution.