Perform online GREAT analysis

submitGreatJob(gr, bg = NULL,
    gr_is_zero_based      = FALSE,
    species               = "hg19",
    genome                = species,
    includeCuratedRegDoms = TRUE,
    rule                  = c("basalPlusExt", "twoClosest", "oneClosest"),
    adv_upstream          = 5.0,
    adv_downstream        = 1.0,
    adv_span              = 1000.0,
    adv_twoDistance       = 1000.0,
    adv_oneDistance       = 1000.0,
    request_interval = 60,
    max_tries = 10,
    version = DEFAULT_VERSION,
    base_url = "http://great.stanford.edu/public/cgi-bin",
    use_name_column = FALSE,
    verbose = help, help = great_opt$verbose)

Arguments

gr

A GRanges object or a data frame which contains at least three columns (chr, start and end).

bg

Not supported any more. See explanations in section "When_background_regions_are_set".

gr_is_zero_based

Are start positions in gr zero-based?

genome

Genome. "hg38", "hg19", "mm10", "mm9" are supported in GREAT version 4.x.x, "hg19", "mm10", "mm9", "danRer7" are supported in GREAT version 3.x.x and "hg19", "hg18", "mm9", "danRer7" are supported in GREAT version 2.x.x.

species

The same as genome but it will be deprecated soon.

includeCuratedRegDoms

Whether to include curated regulatory domains, see https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655443/Association+Rules#AssociationRules-CuratedRegulatoryDomains .

rule

How to associate genomic regions to genes. See 'Details' section.

adv_upstream

Unit: kb, only used when rule is basalPlusExt.

adv_downstream

Unit: kb, only used when rule is basalPlusExt.

adv_span

Unit: kb, only used when rule is basalPlusExt.

adv_twoDistance

Unit: kb, only used when rule is twoClosest.

adv_oneDistance

Unit: kb, only used when rule is oneClosest.

request_interval

Time interval for two requests. Default is 300 seconds.

max_tries

Maximal times for aotumatically reconnecting GREAT web server.

version

Version of GREAT. The value should be "4.0.4", "3.0.0", "2.0.2". Shorten version numbers can also be used, such as using "4" or "4.0" is same as "4.0.4".

base_url

the url of cgi-bin path, only used when it is explicitly specified.

use_name_column

If the input is a data frame, whether to use the fourth column as the "names" of regions?

verbose

Whether to print help messages.

help

Whether to print help messages. This argument will be replaced by verbose in future versions.

Details

Note: On Aug 19 2019 GREAT released version 4(https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655442/Version+History ) where it supports hg38 genome and removes some ontologies such pathways. submitGreatJob still takes hg19 as default. hg38 can be specified by the genome = "hg38" argument. To use the older versions such as 3.0.0, specify as submitGreatJob(..., version = "3.0.0").

Note it does not use the standard GREAT API. This function directly send data to GREAT web server by HTTP POST.

Following text is copied from GREAT web site ( http://great.stanford.edu/public/html/ )

Explanation of rule and settings with names started with 'adv_' (advanced settings):

basalPlusExt

Mode 'Basal plus extension'. Gene regulatory domain definition: Each gene is assigned a basal regulatory domain of a minimum distance upstream and downstream of the TSS (regardless of other nearby genes, controlled by adv_upstream and adv_downstream argument). The gene regulatory domain is extended in both directions to the nearest gene's basal domain but no more than the maximum extension in one direction (controlled by adv_span).

twoClosest

Mode 'Two nearest genes'. Gene regulatory domain definition: Each gene is assigned a regulatory domain that extends in both directions to the nearest gene's TSS (controlled by adv_twoDistance) but no more than the maximum extension in one direction.

oneClosest

Mode 'Single nearest gene'. Gene regulatory domain definition: Each gene is assigned a regulatory domain that extends in both directions to the midpoint between the gene's TSS and the nearest gene's TSS (controlled by adv_oneDistance) but no more than the maximum extension in one direction.

When_background_regions_are_set

Note when bg argument is set to a list of background regions, GREAT uses a completely different test!

When bg is set, gr should be exactly subset of bg. For example, let's say a background region list contains five regions: [1, 10], [15, 23], [34, 38], [40, 49], [54, 63], gr can only be a subset of the five regions, which means gr can take [15, 23], [40, 49], but it cannot take [16, 20], [39, 51]. In this setting, regions are taken as single units and Fisher's exact test is applied for calculating the enrichment (by testing number of regions in the 2x2 contigency table).

Check https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655452/File+Formats#FileFormats-Whatshouldmybackgroundregionsfilecontain? for more explanations.

Please note from rGREAT 1.99.0, setting bg is not supported any more and this argument will be removed in the future. You can either directly use GREAT website or use other Bioconductor packages such as "LOLA" to perform the Fisher's exact test-based analysis.

If you want to restrict the input regions to background regions (by intersections) and still to apply Binomial test there, please consider to use local GREAT by great.

Value

A GreatJob-class object which can be used to get results from GREAT server. The following methods can be applied on it:

See also

great for the local implementation of GREAT algorithm.

Author

Zuguang gu <z.gu@dkfz.de>

Examples

set.seed(123)
gr = randomRegions(nr = 1000, genome = "hg19")
job = submitGreatJob(gr)
#> Note: On Aug 19 2019 GREAT released version 4 which supports hg38
#> genome and removes some ontologies such pathways. submitGreatJob()
#> still takes hg19 as default. hg38 can be specified by argument `genome
#> = "hg38"`. To use the older versions such as 3.0.0, specify as
#> submitGreatJob(..., version = "3"). Set argument `help` to `FALSE` to
#> turn off this message.
job
#> Submit time: 2024-02-27 14:19:03 
#>   Note the results may only be avaiable on GREAT server for 24 hours.
#> Version: 4.0.4 
#> Genome: hg19 
#> Inputs: 1000 regions
#> Mode: Basal plus extension 
#>   Proximal: 5 kb upstream, 1 kb downstream,
#>   plus Distal: up to 1000 kb
#> Include curated regulatory domains
#> 
#> Enrichment tables for following ontologies have been downloaded:
#>   None
#> 

# more parameters can be set for the job
if(FALSE) { # suppress running it when building the package
    # current GREAT version is 4.0.4
    job = submitGreatJob(gr, genome = "hg19")
    job = submitGreatJob(gr, adv_upstream = 10, adv_downstream = 2, adv_span = 2000)
    job = submitGreatJob(gr, rule = "twoClosest", adv_twoDistance = 2000)
    job = submitGreatJob(gr, rule = "oneClosest", adv_oneDistance = 2000)
}