great.Rd
Perform GREAT analysis
great(gr, gene_sets, tss_source, biomart_dataset = NULL,
min_gene_set_size = 5, mode = "basalPlusExt", extend_from = c("TSS", "gene"),
basal_upstream = 5000, basal_downstream = 1000, extension = 1000000,
extended_tss = NULL, background = NULL, exclude = "gap",
cores = 1, verbose = great_opt$verbose)
A GRanges
object. This is the input regions. It is important to keep consistent for the chromosome names of the input regions and the internal TSS regions. Use getTSS
to see the format of internal TSS regions.
A single string of defautly supported gene sets collections (see the full list in "Genesets" section), or a named list of vectors where each vector correspond to a gene set.
Source of TSS. See "TSS" section.
The value should be in BioMartGOGeneSets::supportedOrganisms
.
Minimal size of gene sets.
The mode to extend genes. Value should be one of 'basalPlusExt', 'twoClosest' and 'oneClosest'. See extendTSS
for details.
Should the gene be extended only from its TSS or the complete gene?
In 'basalPlusExt' mode, number of base pairs extending to the upstream of TSS to form the basal domains.
In 'basalPlusExt' mode, number of base pairs extending to the downstream of TSS to form the basal domains.
Extensions from the basal domains.
If your organism is not defaultly supported, you can first prepare one by extendTSSFromDataFrame
or extendTSS
, and set the object to this argument. Please see more examples in the vignette.
Background regions. The value can also be a vector of chromosome names.
Regions that are excluded from analysis such as gap regions (which can be get by getGapFromUCSC
). The value can also be a vector of chromosome names. It also allows a special character value "gap"
so that gap regions for corresponding organism will be removed from the analysis.
Number of cores to use.
Whether to print messages.
When background
or exclude
is set, the analysis is restricted in the background regions, still by using Binomial method. Note
this is different from the original GREAT method which uses Fisher's exact test if background regions is set. See submitGreatJob
for explanations.
By default, gap regions are excluded from the analysis.
rGREAT supports TSS from many organisms. The value of tss_source
should be encoded in a special format:
Name of TxDb.*
packages. Supported packages are in rGREAT:::BIOC_ANNO_PKGS$txdb
.
Genome version of the organism, e.g. "hg19". Then the corresponding TxDb will be used.
In a format of RefSeqCurated:$genome
where $genome
is the genome version of an organism, such as hg19. RefSeqCurated subset will be used.
In a format of RefSeqSelect:$genome
where $genome
is the genome version of an organism, such as hg19. RefSeqSelect subset will be used.
In a format of Gencode_v$version
where $version
is gencode version, such as 19 (for human) or M21 for mouse. Gencode protein coding genes will be used.
In a format of GREAT:$genome
, where $genome
can only be mm9, mm10, hg19, hg38. The TSS from GREAT will be used.
rGREAT supports the following built-in GO gene sets for all organisms (note "GO:" can be omitted):
Biological Process, from GO.db package.
Cellular Component, from GO.db package.
Molecular Function, from GO.db pacakge.
rGREAT also supports built-in gene sets collections from MSigDB (note this is only for human, "msigdb:" can be omitted):
Hallmark gene sets.
Positional gene sets.
Curated gene sets.
C2 subcategory: chemical and genetic perturbations gene sets.
C2 subcategory: canonical pathways gene sets.
C2 subcategory: BioCarta subset of CP.
C2 subcategory: KEGG subset of CP.
C2 subcategory: PID subset of CP.
C2 subcategory: REACTOME subset of CP.
C2 subcategory: WIKIPATHWAYS subset of CP.
Regulatory target gene sets.
miRDB of microRNA targets gene sets.
MIR_Legacy of MIRDB.
GTRD transcription factor targets gene sets.
TFT_Legacy.
Computational gene sets.
C4 subcategory: cancer gene neighborhoods gene sets.
C4 subcategory: cancer modules gene sets.
Ontology gene sets.
C5 subcategory: BP subset.
C5 subcategory: CC subset.
C5 subcategory: MF subset.
C5 subcategory: human phenotype ontology gene sets.
Oncogenic signature gene sets.
Immunologic signature gene sets.
ImmuneSigDB subset of C7.
C7 subcategory: vaccine response gene sets.
Cell type signature gene sets.
If the defaultly supported TxDb is used, Entrez gene ID is always used as the main gene ID. If you provide a self-defined
gene_sets
or extended_tss
, you need to make sure they two have the same gene ID types.
rGREAT supports a large number of organisms of which the information is retrieved from Ensembl BioMart. The name of a BioMart dataset
can be assigned to argument biomart_dataset
. All supported organisms can be found with BioMartGOGeneSets::supportedOrganisms
.
A GreatObject-class
object. The following methods can be applied on it:
getEnrichmentTable,GreatObject-method
to retrieve the result table.
getRegionGeneAssociations,GreatObject-method
to get the associations between input regions and genes.
plotRegionGeneAssociations,GreatObject-method
to plot the associations bewteen input regions and genes.
shinyReport,GreatObject-method
to view the results by a shiny application.
if(FALSE) {
gr = randomRegions(genome = "hg19")
res = great(gr, "MSigDB:H", "txdb:hg19")
res = great(gr, "MSigDB:H", "TxDb.Hsapiens.UCSC.hg19.knownGene")
res = great(gr, "MSigDB:H", "RefSeq:hg19")
res = great(gr, "MSigDB:H", "GREAT:hg19")
res = great(gr, "MSigDB:H", "Gencode_v19")
res = great(gr, "GO:BP", "hsapiens_gene_ensembl")
}