Get genes from BioMart

getBioMartGenes(dataset, add_chr_prefix = FALSE)

Arguments

dataset

A BioMart dataset or a taxon ID. For a proper value, please see supportedOrganisms.

add_chr_prefix

Whether to add "chr" prefix to chromosome names? If it is ture, it uses GenomeInfoDb::seqlevelsStyle(gr) = "UCSC" to add the prefix.

Details

Note add_chr_prefix is just a helper argument. You can basically do the same as:


    gr = getBioMartGenes("hsapiens_gene_ensembl")
    seqlevelsStyle(gr) = "UCSC"  

Value

A GRanges object.

Examples

gr = getBioMartGenes("hsapiens_gene_ensembl")
gr
#> GRanges object with 69299 ranges and 4 metadata columns:
#>                   seqnames              ranges strand | ensembl_gene_id
#>                      <Rle>           <IRanges>  <Rle> |     <character>
#>   ENSG00000000003        X 100627108-100639991      - | ENSG00000000003
#>   ENSG00000000005        X 100584936-100599885      + | ENSG00000000005
#>   ENSG00000000419       20   50934867-50959140      - | ENSG00000000419
#>   ENSG00000000457        1 169849631-169894267      - | ENSG00000000457
#>   ENSG00000000460        1 169662007-169854080      + | ENSG00000000460
#>               ...      ...                 ...    ... .             ...
#>   ENSG00000291313       14 103334237-103335932      + | ENSG00000291313
#>   ENSG00000291314        X   10566888-10576955      - | ENSG00000291314
#>   ENSG00000291315        3   40312086-40312214      + | ENSG00000291315
#>   ENSG00000291316        8 144449582-144465430      - | ENSG00000291316
#>   ENSG00000291317        8 144463817-144465667      - | ENSG00000291317
#>                     gene_biotype   entrezgene_id external_gene_name
#>                      <character> <CharacterList>        <character>
#>   ENSG00000000003 protein_coding            7105             TSPAN6
#>   ENSG00000000005 protein_coding           64102               TNMD
#>   ENSG00000000419 protein_coding            8813               DPM1
#>   ENSG00000000457 protein_coding           57147              SCYL3
#>   ENSG00000000460 protein_coding           55732           C1orf112
#>               ...            ...             ...                ...
#>   ENSG00000291313 protein_coding            <NA>               <NA>
#>   ENSG00000291314 protein_coding            <NA>               <NA>
#>   ENSG00000291315 protein_coding            <NA>               <NA>
#>   ENSG00000291316 protein_coding          157542               <NA>
#>   ENSG00000291317 protein_coding           84773            TMEM276
#>   -------
#>   seqinfo: 445 sequences from an unspecified genome; no seqlengths
gr = getBioMartGenes("hsapiens_gene_ensembl", add_chr_prefix = TRUE)
gr
#> GRanges object with 69299 ranges and 4 metadata columns:
#>                   seqnames              ranges strand | ensembl_gene_id
#>                      <Rle>           <IRanges>  <Rle> |     <character>
#>   ENSG00000000003     chrX 100627108-100639991      - | ENSG00000000003
#>   ENSG00000000005     chrX 100584936-100599885      + | ENSG00000000005
#>   ENSG00000000419    chr20   50934867-50959140      - | ENSG00000000419
#>   ENSG00000000457     chr1 169849631-169894267      - | ENSG00000000457
#>   ENSG00000000460     chr1 169662007-169854080      + | ENSG00000000460
#>               ...      ...                 ...    ... .             ...
#>   ENSG00000291313    chr14 103334237-103335932      + | ENSG00000291313
#>   ENSG00000291314     chrX   10566888-10576955      - | ENSG00000291314
#>   ENSG00000291315     chr3   40312086-40312214      + | ENSG00000291315
#>   ENSG00000291316     chr8 144449582-144465430      - | ENSG00000291316
#>   ENSG00000291317     chr8 144463817-144465667      - | ENSG00000291317
#>                     gene_biotype   entrezgene_id external_gene_name
#>                      <character> <CharacterList>        <character>
#>   ENSG00000000003 protein_coding            7105             TSPAN6
#>   ENSG00000000005 protein_coding           64102               TNMD
#>   ENSG00000000419 protein_coding            8813               DPM1
#>   ENSG00000000457 protein_coding           57147              SCYL3
#>   ENSG00000000460 protein_coding           55732           C1orf112
#>               ...            ...             ...                ...
#>   ENSG00000291313 protein_coding            <NA>               <NA>
#>   ENSG00000291314 protein_coding            <NA>               <NA>
#>   ENSG00000291315 protein_coding            <NA>               <NA>
#>   ENSG00000291316 protein_coding          157542               <NA>
#>   ENSG00000291317 protein_coding           84773            TMEM276
#>   -------
#>   seqinfo: 445 sequences from an unspecified genome; no seqlengths