Gene ID type conversion is a very common task in gene set enrichment analysis. There are two types of packages for gene ID conversion: biomaRt which uses the Ensembl biomart web service and org.*.db family packages where the source information is from NCBI. Here we only introduce the org.*.db packages because they should be enough in applications.

We take org.Hs.eg.db (for human) as an example.

library(org.Hs.eg.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 4.3.2
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## 

Note: it is the same for the OrgDb objects for other orgainsms on AnnotationHub.

Use the select() interface

We need the following three types of information:

  • keys: Gene IDs in one ID type;
  • keytypes: The name of the input ID type;
  • columns: The name of the output ID type;

To get the valid name of ID types:

keytypes(org.Hs.eg.db)
##  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
##  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
## [11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
## [16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
## [21] "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
## [26] "UNIPROT"
columns(org.Hs.eg.db)
##  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
##  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
## [11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
## [16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
## [21] "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
## [26] "UNIPROT"

1:1 mapping

For example, we want to convert the following two genes into Entrez IDs.

genes = c("TP53", "MDM2")
  1. use select(). The following function call can be read as “select ‘ENTREZID’ for the genes where their ‘SYMBOL’ are in ‘gene’”.
map = select(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", columns = "ENTREZID")
## 'select()' returned 1:1 mapping between keys and columns
map
##   SYMBOL ENTREZID
## 1   TP53     7157
## 2   MDM2     4193

What if we convert them to Ensembl gene IDs:

map = select(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", columns = "ENSEMBL")
## 'select()' returned 1:1 mapping between keys and columns
map
##   SYMBOL         ENSEMBL
## 1   TP53 ENSG00000141510
## 2   MDM2 ENSG00000135679

If you want to map to multiple ID types:

select(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", columns = c("ENTREZID", "ENSEMBL"))
## 'select()' returned 1:1 mapping between keys and columns
##   SYMBOL ENTREZID         ENSEMBL
## 1   TP53     7157 ENSG00000141510
## 2   MDM2     4193 ENSG00000135679
  1. use mapIds() which is very similar to select().

Note the argument is named column instead of columns, so you can only map to one gene ID type.

mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID")
## 'select()' returned 1:1 mapping between keys and columns
##   TP53   MDM2 
## "7157" "4193"
mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENSEMBL")
## 'select()' returned 1:1 mapping between keys and columns
##              TP53              MDM2 
## "ENSG00000141510" "ENSG00000135679"

1:many mapping

Now there might be some problems if the mapping is not 1:1.

genes = c("TP53", "MMD2")
map = select(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", columns = "ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
map
##   SYMBOL  ENTREZID
## 1   TP53      7157
## 2   MMD2    221938
## 3   MMD2 100505381

Usually it is hard to pick one unique gene for such 1:mapping case, but we can add an additional column “GENETYPE” when querying:

map = select(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", columns = c("ENTREZID", "GENETYPE"))
## 'select()' returned 1:many mapping between keys and columns
map
##   SYMBOL  ENTREZID       GENETYPE
## 1   TP53      7157 protein-coding
## 2   MMD2    221938 protein-coding
## 3   MMD2 100505381        unknown

For “MMD2”, adding the “GENETYPE” column works because the second hit of it is annotated to “unknown”. We can simply remove it.

map[map$GENETYPE == "protein-coding", ]
##   SYMBOL ENTREZID       GENETYPE
## 1   TP53     7157 protein-coding
## 2   MMD2   221938 protein-coding

And it is always a good idea to only inlucde protein-coding genes in gene set enrichment analysis.

But for mapIds(), it is not as flexible as select(). The multiVals argument controls how to select the gene when the mapping is not 1-vs-1.

mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID", multiVals = "first")
## 'select()' returned 1:many mapping between keys and columns
##     TP53     MMD2 
##   "7157" "221938"
mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID", multiVals = "asNA")
## 'select()' returned 1:many mapping between keys and columns
##   TP53   MMD2 
## "7157"     NA
mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID", multiVals = "list")
## 'select()' returned 1:many mapping between keys and columns
## $TP53
## [1] "7157"
## 
## $MMD2
## [1] "221938"    "100505381"
mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID", multiVals = "filter")
## 'select()' returned 1:many mapping between keys and columns
##   TP53 
## "7157"
gene_type = as.list(org.Hs.egGENETYPE)
# this is very slow
mapIds(org.Hs.eg.db, keys = genes, keytype = "SYMBOL", column = "ENTREZID", 
    multiVals = function(x) {
        x2 = x[sapply(as.list(gene_type[x]), function(y) y == "protein-coding")]
        if(length(x2)) {
            x2[1]
        } else {
            NA
        }
})
## 'select()' returned 1:many mapping between keys and columns
##     TP53     MMD2 
##   "7157" "221938"

Use the pre-generated objects

In org.*.db, there are also pre-generated objects that already contains mapping between EntreZ IDs to a specific gene ID type.

ls(envir = asNamespace("org.Hs.eg.db"))
##  [1] "datacache"                "org.Hs.eg"               
##  [3] "org.Hs.eg.db"             "org.Hs.egACCNUM"         
##  [5] "org.Hs.egACCNUM2EG"       "org.Hs.egALIAS2EG"       
##  [7] "org.Hs.egCHR"             "org.Hs.egCHRLENGTHS"     
##  [9] "org.Hs.egCHRLOC"          "org.Hs.egCHRLOCEND"      
## [11] "org.Hs.egENSEMBL"         "org.Hs.egENSEMBL2EG"     
## [13] "org.Hs.egENSEMBLPROT"     "org.Hs.egENSEMBLPROT2EG" 
## [15] "org.Hs.egENSEMBLTRANS"    "org.Hs.egENSEMBLTRANS2EG"
## [17] "org.Hs.egENZYME"          "org.Hs.egENZYME2EG"      
## [19] "org.Hs.egGENENAME"        "org.Hs.egGENETYPE"       
## [21] "org.Hs.egGO"              "org.Hs.egGO2ALLEGS"      
## [23] "org.Hs.egGO2EG"           "org.Hs.egMAP"            
## [25] "org.Hs.egMAP2EG"          "org.Hs.egMAPCOUNTS"      
## [27] "org.Hs.egOMIM"            "org.Hs.egOMIM2EG"        
## [29] "org.Hs.egORGANISM"        "org.Hs.egPATH"           
## [31] "org.Hs.egPATH2EG"         "org.Hs.egPFAM"           
## [33] "org.Hs.egPMID"            "org.Hs.egPMID2EG"        
## [35] "org.Hs.egPROSITE"         "org.Hs.egREFSEQ"         
## [37] "org.Hs.egREFSEQ2EG"       "org.Hs.egSYMBOL"         
## [39] "org.Hs.egSYMBOL2EG"       "org.Hs.egUCSCKG"         
## [41] "org.Hs.egUNIPROT"         "org.Hs.eg_dbInfo"        
## [43] "org.Hs.eg_dbconn"         "org.Hs.eg_dbfile"        
## [45] "org.Hs.eg_dbschema"

The following six objects can be used to convert between major gene ID types:

  • org.Hs.egENSEMBL: Entrez -> Ensembl
  • org.Hs.egENSEMBL2EG: Ensembl -> Entrez
  • org.Hs.egREFSEQ: Entrez -> RefSeq
  • org.Hs.egREFSEQ2EG: RefSeq -> Entrez
  • org.Hs.egSYMBOL: Entrez -> Symbol
  • org.Hs.egSYMBOL2EG: Symbol -> Entrez
org.Hs.egSYMBOL
## SYMBOL map for Human (object of class "AnnDbBimap")

If you have a single gene, you can use [[:

org.Hs.egSYMBOL2EG[["TP53"]]
## [1] "7157"

If you have multiple genes, use [ + as.list():

lt = as.list(org.Hs.egSYMBOL2EG)
lt[genes]
## $TP53
## [1] "7157"
## 
## $MMD2
## [1] "221938"    "100505381"
# or
as.list(org.Hs.egSYMBOL2EG[genes])
## $TP53
## [1] "7157"
## 
## $MMD2
## [1] "221938"    "100505381"

You can also use toTable(), but it is less used for converting gene IDs:

tb = toTable(org.Hs.egSYMBOL2EG)

As you can see, the drawback is you cannot use additional information to filter genes, such as gene types.

Conclusion

If the mappig is 1:1, all the methods metioned is OK. When the mapping is 1:many, use select() + filtering by the “GENETYPE” column is safer.

In the GSEAtraining package, there is one such function. It also automatically detect the input gene ID type.

Practice

Practice 1

In the object diff_gene from the following code, there is a vector of gene symbols (human), try to convert them to EntreZ IDs (do not use convert_to_entrez_id()).

lt = readRDS(system.file("extdata", "ora.rds", package = "GSEAtraining"))
diff_gene = lt$diff_gene
head(diff_gene)
## [1] "FGR"    "NIPAL3" "LAP3"   "CASP10" "CAMKK1" "PRSS22"

Solution

map = select(org.Hs.eg.db, keys = diff_gene, keytype = "SYMBOL", columns = "ENTREZID")
unique(map$ENTREZID)

# or
map = mapIds(org.Hs.eg.db, keys = diff_gene, keytype = "SYMBOL", column = "ENTREZID")

Practice 2

Now we know gene ID mapping between different ID types is not always 1:1. Count how many (the percent) gene symbols can not be uniquely mapped to EntreZ IDs, and how many (the percent) Ensembl IDs can not be uniquely mapped to EntreZ IDs. And recalculate these two numbers only taking into account the protein-coding genes.

Solution

all_symbols = keys(org.Hs.eg.db, keytype = "SYMBOL")

# use select
map = select(org.Hs.eg.db, keys = all_symbols, keytype = "SYMBOL", columns = "ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
tb = table(map$SYMBOL)
sum(tb > 1)/length(tb)
## [1] 0.0001265396
# use mapIds
map = mapIds(org.Hs.eg.db, keys = all_symbols, keytype = "SYMBOL", column = "ENTREZID", 
    multiVals = "list")
## 'select()' returned 1:many mapping between keys and columns
n = sapply(map, length)
sum(n > 1)/length(n)
## [1] 0.0001265396
# use org.Hs.egSYMBOL2EG
lt = as.list(org.Hs.egSYMBOL2EG)
n = sapply(lt, length)
sum(n > 1)/length(n)
## [1] 0.0001265396

Only protein-coding genes:

# use select
map = select(org.Hs.eg.db, keys = all_symbols, keytype = "SYMBOL", 
    columns = c("ENTREZID", "GENETYPE"))
## 'select()' returned 1:many mapping between keys and columns
map = map[map$GENETYPE == "protein-coding", ]

tb = table(map$SYMBOL)
sum(tb > 1)/length(tb)
## [1] 0

Ensembl -> Entrez

all_ids = keys(org.Hs.eg.db, keytype = "ENSEMBL")

# use select
map = select(org.Hs.eg.db, keys = all_ids, keytype = "ENSEMBL", columns = "ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
tb = table(map$ENSEMBL)
sum(tb > 1)/length(tb)
## [1] 0.01674239
tail(sort(tb))
## 
## ENSG00000276700 ENSG00000277739 ENSG00000278189 ENSG00000278233 ENSG00000288326 
##             211             211             211             211             211 
## ENSG00000288387 
##             211

Only protein-coding genes:

# use select
map = select(org.Hs.eg.db, keys = all_ids, keytype = "ENSEMBL", columns = c("ENTREZID", "GENETYPE"))
## 'select()' returned 1:many mapping between keys and columns
map = map[map$GENETYPE == "protein-coding", ]

tb = table(map$ENSEMBL)
sum(tb > 1)/length(tb)
## [1] 0.01811382

Last, we add a SYMBOL column in the map table and check what are these Ensembl genes which map to so many Entrez genes:

map = select(org.Hs.eg.db, keys = all_ids, keytype = "ENSEMBL", 
    columns = c("ENTREZID", "SYMBOL", "GENETYPE"))
## 'select()' returned 1:many mapping between keys and columns
map[map$ENSEMBL %in% names(tail(sort(tb))), ]
##               ENSEMBL  ENTREZID       SYMBOL       GENETYPE
## 6359  ENSG00000258992      7258        TSPY1 protein-coding
## 6360  ENSG00000258992 124905614 LOC124905614 protein-coding
## 6361  ENSG00000258992 124905615 LOC124905615 protein-coding
## 6362  ENSG00000258992 124905616 LOC124905616 protein-coding
## 6363  ENSG00000258992 124905617 LOC124905617 protein-coding
## 6364  ENSG00000258992 124905618 LOC124905618 protein-coding
## 6365  ENSG00000258992 124905619 LOC124905619 protein-coding
## 6366  ENSG00000258992 124905620 LOC124905620 protein-coding
## 6367  ENSG00000258992 124905621 LOC124905621 protein-coding
## 6368  ENSG00000258992 124905623 LOC124905623 protein-coding
## 6369  ENSG00000258992 124905624 LOC124905624 protein-coding
## 6370  ENSG00000258992 124905625 LOC124905625 protein-coding
## 6371  ENSG00000258992 124905626 LOC124905626 protein-coding
## 6372  ENSG00000258992 124905627 LOC124905627 protein-coding
## 6373  ENSG00000258992 124905628 LOC124905628 protein-coding
## 6374  ENSG00000258992 124905629 LOC124905629 protein-coding
## 6375  ENSG00000258992 124908978 LOC124908978 protein-coding
## 6376  ENSG00000258992 124908979 LOC124908979 protein-coding
## 6377  ENSG00000258992 124908980 LOC124908980 protein-coding
## 6378  ENSG00000258992 124908981 LOC124908981 protein-coding
## 6379  ENSG00000258992 124908988 LOC124908988 protein-coding
## 6380  ENSG00000258992 124909015 LOC124909015 protein-coding
## 6381  ENSG00000258992 124909084 LOC124909084 protein-coding
## 6382  ENSG00000258992 124909294 LOC124909294 protein-coding
## 6383  ENSG00000258992 124909306 LOC124909306 protein-coding
## 6384  ENSG00000258992 124909318 LOC124909318 protein-coding
## 6385  ENSG00000258992 124909320 LOC124909320 protein-coding
## 6386  ENSG00000258992 124909330 LOC124909330 protein-coding
## 26248 ENSG00000284234    646066      TAF11L5 protein-coding
## 26249 ENSG00000284234 112488738     TAF11L10 protein-coding
## 26250 ENSG00000284234 124906475 LOC124906475 protein-coding
## 26251 ENSG00000284234 124906476 LOC124906476 protein-coding
## 26252 ENSG00000284234 124906477 LOC124906477 protein-coding
## 26253 ENSG00000284234 124906478 LOC124906478 protein-coding
## 26254 ENSG00000284234 124906479 LOC124906479 protein-coding
## 26255 ENSG00000284234 124906480 LOC124906480 protein-coding
## 26256 ENSG00000284234 124906481 LOC124906481 protein-coding
## 26257 ENSG00000284234 124906482 LOC124906482 protein-coding
## 26258 ENSG00000284234 124906483 LOC124906483 protein-coding
## 26259 ENSG00000284234 124906484 LOC124906484 protein-coding
## 26260 ENSG00000284234 124906485 LOC124906485 protein-coding
## 26261 ENSG00000284234 124906486 LOC124906486 protein-coding
## 26262 ENSG00000284234 124906487 LOC124906487 protein-coding
## 26263 ENSG00000284234 124906488 LOC124906488 protein-coding
## 26264 ENSG00000284234 124906489 LOC124906489 protein-coding
## 26265 ENSG00000284234 124906490 LOC124906490 protein-coding
## 26266 ENSG00000284234 124906491 LOC124906491 protein-coding
## 26267 ENSG00000284356    646066      TAF11L5 protein-coding
## 26268 ENSG00000284356 112488738     TAF11L10 protein-coding
## 26269 ENSG00000284356 124906475 LOC124906475 protein-coding
## 26270 ENSG00000284356 124906476 LOC124906476 protein-coding
## 26271 ENSG00000284356 124906477 LOC124906477 protein-coding
## 26272 ENSG00000284356 124906478 LOC124906478 protein-coding
## 26273 ENSG00000284356 124906479 LOC124906479 protein-coding
## 26274 ENSG00000284356 124906480 LOC124906480 protein-coding
## 26275 ENSG00000284356 124906481 LOC124906481 protein-coding
## 26276 ENSG00000284356 124906482 LOC124906482 protein-coding
## 26277 ENSG00000284356 124906483 LOC124906483 protein-coding
## 26278 ENSG00000284356 124906484 LOC124906484 protein-coding
## 26279 ENSG00000284356 124906485 LOC124906485 protein-coding
## 26280 ENSG00000284356 124906486 LOC124906486 protein-coding
## 26281 ENSG00000284356 124906487 LOC124906487 protein-coding
## 26282 ENSG00000284356 124906488 LOC124906488 protein-coding
## 26283 ENSG00000284356 124906489 LOC124906489 protein-coding
## 26284 ENSG00000284356 124906490 LOC124906490 protein-coding
## 26285 ENSG00000284356 124906491 LOC124906491 protein-coding
## 27074 ENSG00000228927    728137        TSPY3 protein-coding
## 27075 ENSG00000228927 124905614 LOC124905614 protein-coding
## 27076 ENSG00000228927 124905615 LOC124905615 protein-coding
## 27077 ENSG00000228927 124905616 LOC124905616 protein-coding
## 27078 ENSG00000228927 124905617 LOC124905617 protein-coding
## 27079 ENSG00000228927 124905618 LOC124905618 protein-coding
## 27080 ENSG00000228927 124905619 LOC124905619 protein-coding
## 27081 ENSG00000228927 124905620 LOC124905620 protein-coding
## 27082 ENSG00000228927 124905621 LOC124905621 protein-coding
## 27083 ENSG00000228927 124905624 LOC124905624 protein-coding
## 27084 ENSG00000228927 124905625 LOC124905625 protein-coding
## 27085 ENSG00000228927 124905626 LOC124905626 protein-coding
## 27086 ENSG00000228927 124905627 LOC124905627 protein-coding
## 27087 ENSG00000228927 124905628 LOC124905628 protein-coding
## 27088 ENSG00000228927 124905629 LOC124905629 protein-coding
## 27089 ENSG00000228927 124908978 LOC124908978 protein-coding
## 27090 ENSG00000228927 124908979 LOC124908979 protein-coding
## 27091 ENSG00000228927 124908980 LOC124908980 protein-coding
## 27092 ENSG00000228927 124908981 LOC124908981 protein-coding
## 27093 ENSG00000228927 124909197 LOC124909197 protein-coding
## 27094 ENSG00000228927 124909294 LOC124909294 protein-coding
## 27095 ENSG00000228927 124909306 LOC124909306 protein-coding
## 27096 ENSG00000228927 124909318 LOC124909318 protein-coding
## 27097 ENSG00000228927 124909320 LOC124909320 protein-coding
## 27098 ENSG00000228927 124909330 LOC124909330 protein-coding
## 31999 ENSG00000283949 100288687         DUX4 protein-coding
## 32000 ENSG00000283949 124905408 LOC124905408 protein-coding
## 32001 ENSG00000283949 124905409 LOC124905409 protein-coding
## 32002 ENSG00000283949 124905410 LOC124905410 protein-coding
## 32003 ENSG00000283949 124905411 LOC124905411 protein-coding
## 32004 ENSG00000283949 124906452 LOC124906452 protein-coding
## 32005 ENSG00000283949 124906453 LOC124906453 protein-coding
## 32006 ENSG00000283949 124906454 LOC124906454 protein-coding
## 32007 ENSG00000283949 124906456 LOC124906456 protein-coding
## 32008 ENSG00000283949 124906457 LOC124906457 protein-coding
## 32009 ENSG00000283949 124906458 LOC124906458 protein-coding
## 32010 ENSG00000283949 124906459 LOC124906459 protein-coding
## 32011 ENSG00000283949 124906460 LOC124906460 protein-coding
## 32012 ENSG00000283949 124906461 LOC124906461 protein-coding
## 32013 ENSG00000283949 124906462 LOC124906462 protein-coding
## 32014 ENSG00000283949 124906463 LOC124906463 protein-coding
## 32015 ENSG00000283949 124906464 LOC124906464 protein-coding
## 32016 ENSG00000283949 124906465 LOC124906465 protein-coding
## 32017 ENSG00000283949 124906466 LOC124906466 protein-coding
## 32055 ENSG00000236424 100289087       TSPY10 protein-coding
## 32056 ENSG00000236424 124905614 LOC124905614 protein-coding
## 32057 ENSG00000236424 124905615 LOC124905615 protein-coding
## 32058 ENSG00000236424 124905616 LOC124905616 protein-coding
## 32059 ENSG00000236424 124905617 LOC124905617 protein-coding
## 32060 ENSG00000236424 124905618 LOC124905618 protein-coding
## 32061 ENSG00000236424 124905619 LOC124905619 protein-coding
## 32062 ENSG00000236424 124905620 LOC124905620 protein-coding
## 32063 ENSG00000236424 124905621 LOC124905621 protein-coding
## 32064 ENSG00000236424 124905623 LOC124905623 protein-coding
## 32065 ENSG00000236424 124905624 LOC124905624 protein-coding
## 32066 ENSG00000236424 124905625 LOC124905625 protein-coding
## 32067 ENSG00000236424 124905626 LOC124905626 protein-coding
## 32068 ENSG00000236424 124905627 LOC124905627 protein-coding
## 32069 ENSG00000236424 124905628 LOC124905628 protein-coding
## 32070 ENSG00000236424 124905629 LOC124905629 protein-coding
## 32071 ENSG00000236424 124905630 LOC124905630 protein-coding
## 32072 ENSG00000236424 124908978 LOC124908978 protein-coding
## 32073 ENSG00000236424 124908979 LOC124908979 protein-coding
## 32074 ENSG00000236424 124908980 LOC124908980 protein-coding
## 32075 ENSG00000236424 124908981 LOC124908981 protein-coding
## 32076 ENSG00000236424 124909093 LOC124909093 protein-coding
## 32077 ENSG00000236424 124909197 LOC124909197 protein-coding
## 32078 ENSG00000236424 124909294 LOC124909294 protein-coding
## 32079 ENSG00000236424 124909306 LOC124909306 protein-coding
## 32080 ENSG00000236424 124909318 LOC124909318 protein-coding