Perform GREAT analysis

great(gr, gene_sets, tss_source, biomart_dataset = NULL,
    min_gene_set_size = 5, mode = "basalPlusExt", extend_from = c("TSS", "gene"),
    basal_upstream = 5000, basal_downstream = 1000, extension = 1000000,
    extended_tss = NULL, background = NULL, exclude = "gap",
    cores = 1, verbose = great_opt$verbose)

Arguments

gr

A GRanges object. This is the input regions. It is important to keep consistent for the chromosome names of the input regions and the internal TSS regions. Use getTSS to see the format of internal TSS regions.

gene_sets

A single string of defautly supported gene sets collections (see the full list in "Genesets" section), or a named list of vectors where each vector correspond to a gene set.

tss_source

Source of TSS. See "TSS" section.

biomart_dataset

The value should be in BioMartGOGeneSets::supportedOrganisms.

min_gene_set_size

Minimal size of gene sets.

mode

The mode to extend genes. Value should be one of 'basalPlusExt', 'twoClosest' and 'oneClosest'. See extendTSS for details.

extend_from

Should the gene be extended only from its TSS or the complete gene?

basal_upstream

In 'basalPlusExt' mode, number of base pairs extending to the upstream of TSS to form the basal domains.

basal_downstream

In 'basalPlusExt' mode, number of base pairs extending to the downstream of TSS to form the basal domains.

extension

Extensions from the basal domains.

extended_tss

If your organism is not defaultly supported, you can first prepare one by extendTSSFromDataFrame or extendTSS, and set the object to this argument. Please see more examples in the vignette.

background

Background regions. The value can also be a vector of chromosome names.

exclude

Regions that are excluded from analysis such as gap regions (which can be get by getGapFromUCSC). The value can also be a vector of chromosome names. It also allows a special character value "gap" so that gap regions for corresponding organism will be removed from the analysis.

cores

Number of cores to use.

verbose

Whether to print messages.

Details

When background or exclude is set, the analysis is restricted in the background regions, still by using Binomial method. Note this is different from the original GREAT method which uses Fisher's exact test if background regions is set. See submitGreatJob for explanations.

By default, gap regions are excluded from the analysis.

Tss

rGREAT supports TSS from many organisms. The value of tss_source should be encoded in a special format:

  • Name of TxDb.* packages. Supported packages are in rGREAT:::BIOC_ANNO_PKGS$txdb.

  • Genome version of the organism, e.g. "hg19". Then the corresponding TxDb will be used.

  • In a format of RefSeqCurated:$genome where $genome is the genome version of an organism, such as hg19. RefSeqCurated subset will be used.

  • In a format of RefSeqSelect:$genome where $genome is the genome version of an organism, such as hg19. RefSeqSelect subset will be used.

  • In a format of Gencode_v$version where $version is gencode version, such as 19 (for human) or M21 for mouse. Gencode protein coding genes will be used.

  • In a format of GREAT:$genome, where $genome can only be mm9, mm10, hg19, hg38. The TSS from GREAT will be used.

Genesets

rGREAT supports the following built-in GO gene sets for all organisms (note "GO:" can be omitted):

"GO:BP":

Biological Process, from GO.db package.

"GO:CC":

Cellular Component, from GO.db package.

"GO:MP":

Molecular Function, from GO.db pacakge.

rGREAT also supports built-in gene sets collections from MSigDB (note this is only for human, "msigdb:" can be omitted):

"msigdb:H"

Hallmark gene sets.

"msigdb:C1"

Positional gene sets.

"msigdb:C2"

Curated gene sets.

"msigdb:C2:CGP"

C2 subcategory: chemical and genetic perturbations gene sets.

"msigdb:C2:CP"

C2 subcategory: canonical pathways gene sets.

"msigdb:C2:CP:BIOCARTA"

C2 subcategory: BioCarta subset of CP.

"msigdb:C2:CP:KEGG"

C2 subcategory: KEGG subset of CP.

"msigdb:C2:CP:PID"

C2 subcategory: PID subset of CP.

"msigdb:C2:CP:REACTOME"

C2 subcategory: REACTOME subset of CP.

"msigdb:C2:CP:WIKIPATHWAYS"

C2 subcategory: WIKIPATHWAYS subset of CP.

"msigdb:C3"

Regulatory target gene sets.

"msigdb:C3:MIR:MIRDB"

miRDB of microRNA targets gene sets.

"msigdb:C3:MIR:MIR_LEGACY"

MIR_Legacy of MIRDB.

"msigdb:C3:TFT:GTRD"

GTRD transcription factor targets gene sets.

"msigdb:C3:TFT:TFT_LEGACY"

TFT_Legacy.

"msigdb:C4"

Computational gene sets.

"msigdb:C4:CGN"

C4 subcategory: cancer gene neighborhoods gene sets.

"msigdb:C4:CM"

C4 subcategory: cancer modules gene sets.

"msigdb:C5"

Ontology gene sets.

"msigdb:C5:GO:BP"

C5 subcategory: BP subset.

"msigdb:C5:GO:CC"

C5 subcategory: CC subset.

"msigdb:C5:GO:MF"

C5 subcategory: MF subset.

"msigdb:C5:HPO"

C5 subcategory: human phenotype ontology gene sets.

"msigdb:C6"

Oncogenic signature gene sets.

"msigdb:C7"

Immunologic signature gene sets.

"msigdb:C7:IMMUNESIGDB"

ImmuneSigDB subset of C7.

"msigdb:C7:VAX"

C7 subcategory: vaccine response gene sets.

"msigdb:C8"

Cell type signature gene sets.

If the defaultly supported TxDb is used, Entrez gene ID is always used as the main gene ID. If you provide a self-defined gene_sets or extended_tss, you need to make sure they two have the same gene ID types.

Biomart

rGREAT supports a large number of organisms of which the information is retrieved from Ensembl BioMart. The name of a BioMart dataset can be assigned to argument biomart_dataset. All supported organisms can be found with BioMartGOGeneSets::supportedOrganisms.

Value

A GreatObject-class object. The following methods can be applied on it:

Examples

if(FALSE) {
gr = randomRegions(genome = "hg19")
res = great(gr, "MSigDB:H", "txdb:hg19")
res = great(gr, "MSigDB:H", "TxDb.Hsapiens.UCSC.hg19.knownGene")
res = great(gr, "MSigDB:H", "RefSeq:hg19")
res = great(gr, "MSigDB:H", "GREAT:hg19")
res = great(gr, "MSigDB:H", "Gencode_v19")
res = great(gr, "GO:BP", "hsapiens_gene_ensembl")
}