Calculate word frequency
Usage
count_words(
term,
exclude_words = NULL,
stop_words = stopwords(),
min_word_length = 1,
tokenizer = "words",
transform_case = tolower,
remove_numbers = TRUE,
remove_punctuation = TRUE,
custom_transformer = NULL,
stemming = FALSE,
dictionary = NULL
)
Arguments
- term
A vector of description texts.
- exclude_words
The words that should be excluded.
- stop_words
The stop words that should be be removed.
- min_word_length
Minimum length of the word to be counted.
- tokenizer
The tokenizer function, one of the values accepted by
tm::termFreq
.- transform_case
The function normalizing lettercase of the words.
- remove_numbers
Whether to remove numbers.
- remove_punctuation
Whether to remove punctuation.
- custom_transformer
Custom function that transforms words.
- stemming
Whether to only keep the roots of inflected words.
- dictionary
A vector of words to be counted (if given all other words will be excluded).
Details
The text preprocessing followings the instruction from http://www.sthda.com/english/wiki/word-cloud-generator-in-r-one-killer-function-to-do-everything-you-need.
Examples
gm = readRDS(system.file("extdata", "random_GO_BP_sim_mat.rds", package = "simplifyEnrichment"))
go_id = rownames(gm)
go_term = AnnotationDbi::select(GO.db::GO.db, keys = go_id, columns = "TERM")$TERM
#> 'select()' returned 1:1 mapping between keys and columns
count_words(go_term) |> head()
#> word freq
#> regulation regulation 179
#> process process 63
#> positive positive 61
#> cell cell 59
#> negative negative 57
#> signaling signaling 43