vignettes/GeneSummary.Rmd
GeneSummary.Rmd
This package provides long description of genes collected from the RefSeq database. The text in “COMMENT” section started with “Summary:” is extracted as the description of the gene, e.g. in the following example:
LOCUS NM_012363 936 bp mRNA linear PRI 12-FEB-2021
DEFINITION Homo sapiens olfactory receptor family 1 subfamily N member 1
(OR1N1), mRNA.
ACCESSION NM_012363 XM_071152
VERSION NM_012363.1
KEYWORDS RefSeq; MANE Select.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 936)
AUTHORS Malnic B, Godfrey PA and Buck LB.
TITLE The human olfactory receptor gene family
JOURNAL Proc Natl Acad Sci U S A 101 (8), 2584-2589 (2004)
PUBMED 14983052
REMARK Erratum:[Proc Natl Acad Sci U S A. 2004 May 4;101(18):7205]
REFERENCE 2 (bases 1 to 936)
AUTHORS Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R,
Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach
H, Lancet D and Shamir R.
TITLE DEFOG: a practical scheme for deciphering families of genes
JOURNAL Genomics 80 (3), 295-302 (2002)
PUBMED 12213199
REFERENCE 3 (bases 1 to 936)
AUTHORS Rouquier S, Taviaux S, Trask BJ, Brand-Arpon V, van den Engh G,
Demaille J and Giorgi D.
TITLE Distribution of olfactory receptor genes in the human genome
JOURNAL Nat Genet 18 (3), 243-250 (1998)
PUBMED 9500546
REMARK Erratum:[Nat Genet 1998 May;19(1):102]
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AL359636.17.
On Apr 5, 2004 this sequence version replaced XM_071152.1.
Summary: Olfactory receptors interact with odorant molecules in the
nose, to initiate a neuronal response that triggers the perception
of a smell. The olfactory receptor proteins are members of a large
family of G-protein-coupled receptors (GPCR) arising from single
coding-exon genes. Olfactory receptors share a 7-transmembrane
domain structure with many neurotransmitter and hormone receptors
and are responsible for the recognition and G protein-mediated
transduction of odorant signals. The olfactory receptor gene family
is the largest in the genome. The nomenclature assigned to the
olfactory receptor genes and proteins for this organism is
independent of other organisms. [provided by RefSeq, Jul 2008].
##RefSeq-Attributes-START##
MANE Ensembl match :: ENST00000304880.2/ ENSP00000306974.2
RefSeq Select criteria :: based on single protein-coding transcript
##RefSeq-Attributes-END##
Function loadGeneSummary()
extracts the gene summary table. Specifying the organism
argument with the full name or the corresponding taxon ID returns a table of genes and their summaries:
## Gene summaries were retrieved from RefSeq database release 214 (Sep 30, 2022).
tb = loadGeneSummary(organism = 9606)
# # or use the full organism name
# tb = loadGeneSummary(organism = "Homo sapiens")
dim(tb)
## [1] 53545 6
head(tb)
## RefSeq_accession Organism Taxon_ID Gene_ID Review_status
## 1 NR_039609.1 Homo sapiens 9606 100616498 PROVISIONAL REFSEQ
## 2 NR_030183.1 Homo sapiens 9606 574461 PROVISIONAL REFSEQ
## 3 NR_039939.1 Homo sapiens 9606 100616159 PROVISIONAL REFSEQ
## 4 NR_107042.1 Homo sapiens 9606 102465874 PROVISIONAL REFSEQ
## 5 NR_030222.1 Homo sapiens 9606 574500 PROVISIONAL REFSEQ
## 6 NR_030188.1 Homo sapiens 9606 574466 PROVISIONAL REFSEQ
## Gene_summary
## 1 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 2 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 3 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 4 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 5 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 6 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
Setting organism
to NULL
returns a table of all organisms.
tb = loadGeneSummary(organism = NULL)
sort(table(tb$Organism))
##
## Aedes aegypti Aotus nancymaae
## 1 1
## Aplysia californica Bison bison bison
## 1 1
## Callorhinchus milii Macaca nemestrina
## 1 1
## Mandrillus leucophaeus Rhinopithecus roxellana
## 1 1
## Anas platyrhynchos Cercocebus atys
## 2 2
## Chelonia mydas Colobus angolensis palliatus
## 2 2
## Crassostrea gigas Geospiza fortis
## 2 2
## Latimeria chalumnae Loxodonta africana
## 2 2
## Melopsittacus undulatus Nannospalax galili
## 2 2
## Python bivittatus Alligator sinensis
## 2 3
## Amphimedon queenslandica Chlorocebus sabaeus
## 3 3
## Columba livia Falco cherrug
## 3 3
## Falco peregrinus Oncorhynchus mykiss
## 3 3
## Orycteropus afer afer Pelodiscus sinensis
## 3 3
## Salmo salar Zonotrichia albicollis
## 3 3
## Alligator mississippiensis Bos mutus
## 4 4
## Ficedula albicollis Meleagris gallopavo
## 4 4
## Myotis brandtii Myotis davidii
## 4 4
## Pseudopodoces humilis Ailuropoda melanoleuca
## 4 5
## Astyanax mexicanus Balaenoptera acutorostrata scammoni
## 5 5
## Camelus ferus Elephantulus edwardii
## 5 5
## Panthera tigris Poecilia formosa
## 5 5
## Chrysemys picta Heterocephalus glaber
## 6 6
## Otolemur garnettii Physeter catodon
## 6 6
## Saimiri boliviensis Sorex araneus
## 6 6
## Cavia porcellus Chinchilla lanigera
## 7 7
## Dasypus novemcinctus Leptonychotes weddellii
## 7 7
## Myotis lucifugus Octodon degus
## 7 7
## Tursiops truncatus Ceratotherium simum simum
## 7 8
## Condylura cristata Echinops telfairi
## 8 8
## Erinaceus europaeus Jaculus jaculus
## 8 8
## Mesocricetus auratus Mustela putorius furo
## 8 8
## Ochotona princeps Pteropus alecto
## 8 8
## Vicugna pacos Chrysochloris asiatica
## 8 9
## Felis catus Ictidomys tridecemlineatus
## 9 9
## Lipotes vexillifer Odobenus rosmarus divergens
## 9 9
## Orcinus orca Trichechus manatus latirostris
## 9 9
## Hydra vulgaris Microtus ochrogaster
## 10 10
## Papio anubis Bubalus bubalis
## 10 11
## Macaca fascicularis Nomascus leucogenys
## 11 11
## Peromyscus maniculatus bairdii Pongo abelii
## 11 14
## Callithrix jacchus Strongylocentrotus purpuratus
## 15 64
## Sarcophilus harrisii Xenopus laevis
## 65 84
## Brassica rapa Saccoglossus kowalevskii
## 89 90
## Cucumis melo Ovis aries
## 104 115
## Acyrthosiphon pisum Malus domestica
## 125 130
## Takifugu rubripes Citrus sinensis
## 140 146
## Solanum lycopersicum Vitis vinifera
## 152 156
## Oryzias latipes Zea mays
## 161 166
## Pan paniscus Tupaia chinensis
## 179 184
## Solanum tuberosum Cricetulus griseus
## 215 236
## Xenopus tropicalis Taeniopygia guttata
## 244 248
## Apis mellifera Capra hircus
## 254 277
## Anolis carolinensis Brachypodium distachyon
## 293 312
## Oryctolagus cuniculus Ciona intestinalis
## 319 331
## Nasonia vitripennis Tribolium castaneum
## 332 333
## Gorilla gorilla Ornithorhynchus anatinus
## 374 396
## Sus scrofa Bombyx mori
## 402 423
## Danio rerio Eptesicus fuscus
## 440 494
## Glycine max Macaca mulatta
## 670 677
## Pan troglodytes Monodelphis domestica
## 680 685
## Gallus gallus Canis lupus familiaris
## 966 1085
## Equus caballus Bos taurus
## 1463 1966
## Rattus norvegicus Mus musculus
## 2059 6254
## Homo sapiens
## 53545
##
## PREDICTED REFSEQ INFERRED REFSEQ VALIDATED REFSEQ PROVISIONAL REFSEQ
## 9 2351 6550 17462
## REVIEWED REFSEQ
## 52208
A specific status can be set via argument status
, e.g. only to "reviewed"
:
tb = loadGeneSummary(organism = NULL, status = "reviewed")
sort(table(tb$Review_status))
## REVIEWED REFSEQ
## 52208
Version of the data:
GeneSummary
## RefSeq gene summaries
## RefSeq release: 214
## Source: https://ftp.ncbi.nih.gov/refseq/release/complete/*.rna.gbff.gz
## Number of organisms: 129
## Built date: 2022-09-30
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/UTF-8/C/C/C/C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GeneSummary_0.99.4
##
## loaded via a namespace (and not attached):
## [1] knitr_1.39 magrittr_2.0.3 R6_2.5.1 ragg_1.2.2
## [5] rlang_1.0.4 fastmap_1.1.0 stringr_1.4.0 tools_4.2.0
## [9] xfun_0.31 cli_3.3.0 jquerylib_0.1.4 htmltools_0.5.3
## [13] systemfonts_1.0.4 yaml_2.3.5 digest_0.6.29 rprojroot_2.0.3
## [17] pkgdown_2.0.6 textshaping_0.3.6 purrr_0.3.4 sass_0.4.2
## [21] fs_1.5.2 memoise_2.0.1 cachem_1.0.6 evaluate_0.15
## [25] rmarkdown_2.14 stringi_1.7.8 compiler_4.2.0 bslib_0.4.0
## [29] desc_1.4.1 jsonlite_1.8.0